Placeholder Image

Subtitles section Play video

  • what's going on.

  • Everybody hope everybody is being safe, smart and socially distant.

  • I want to today to draw your attention to the following cattle challenge, obviously highly relevant to today.

  • But also I find this to appear to be much more like what the the types of tasks, at least for me, that I have found businesses to need when people come to me for, like contracting and consulting, the data tends to look more like this than your average cattle competition.

  • Now nothing against karaoke competitions.

  • There's definitely true competition there.

  • Ah, but generally cattle comes down to, um, a competition of optimization rather than a competition of finding insights in highly highly unstructured data.

  • Now, sometimes that's wrong.

  • But for the most part, that's kind of what you find on casual, because cattle has to be objective in enforcing that objective objectivity.

  • There's people that will optimize for those objective results.

  • So anyways, enough on that, this is very interesting data set.

  • I've downloaded it and kind of peeked at it on.

  • That's kind of why I decided not.

  • This would be really good to do a video on it just barely peaked at the data and then we're gonna dive in by a code and you will see kind of firsthand the process that I at least begin with looking through a data set.

  • This kind of data set, though I mean, you will spend, you know, 22 hundreds of hours, you know, going through here, trying to get insight.

  • So probably here at best, will see, like the first hour.

  • So don't expect anything too crazy, but this is just kind of my process at least.

  • So Thio get the data, Just go ahead, make an account.

  • You could download the data.

  • Also, they have various tasks here.

  • We'll talk about this in a little bit, but as you've likely found, like the problem here is, there's so much information like if we look at data says, there's 29,000 articles and then 13,000 are full text and these air scholarly articles Justo, make that clear, which is important.

  • And there's so much information in so much new information that's just being pushed out very fast.

  • There's historical information and the new information, and it's very difficult.

  • So as you've probably found yourself like trying to learn about this is the variability in the information and fax that you might hear is really high.

  • And then not only that, the variability, but there's also just there's so many little details that just really difficult.

  • And so, in this case with the, um, you know, there's the critical in acute nature of what we're going through here.

  • It's making it very hard.

  • Normally, it really would be like grad students that would sit and go through all this stuff.

  • We just do not have time for that, you know?

  • And then businesses, for example, like they it's either you or they're gonna hire some interns or something to do to try to answer these questions.

  • But it turns out, programming can often, uh, do a better job at it.

  • So anyway, uh, let's dig in.

  • We'll talk about these tasks in a moment, but first start to download of data, and then we're going to start going through, um, the actual data.

  • Once it's downloaded, you can either pause or just Lazar download is going.

  • Let's look at my my file so you'll get it.

  • Uh, you know, a zip, you'll unzip it, and then you'll get this basically.

  • So the last update was March 13th.

  • It is March 19th at the time of my recording.

  • This but eh, so hopefully at some point, maybe they'll make an update.

  • But we'll see.

  • Um, so coming in here, this is what we get.

  • We get these four directories initially, I kind of expected this to be the full thing, right?

  • And then this was like, the commercial subset, noncommercial subset and then some sort of custom license subset.

  • But then I click on here.

  • I see 803 items on, but I click on this one and I see 9000 items.

  • This one has almost 2000 items on the custom license has 1400.

  • So that tells me right away.

  • No, we were gonna have to go through all of these directories and then inside the directory, we have another directory.

  • First of all, just keep this in mind with the exact same name as the parent directory.

  • So who knows what that's going on?

  • But then we have these Jason files.

  • So now let's just look at the jays on file, so we know what we're looking at.

  • So already the Jason I'm like, Well, that's not what I expected there's got to be in Jason.

  • I mean, maybe it's coming from sort of Jake some sort of Jason based database.

  • I don't know anyway, s So what we have here is clearly just your typical, you know, keys and values scrolling down.

  • Okay, We've the first, like, body of text that we found at all.

  • It's called abstract, so we can grab the abstract.

  • Okay.

  • Noted, uh, then we truly have body text, and then we can already see quite quickly.

  • It's in chunks of text.

  • So you've got body text and then text, and then maybe some information.

  • In this case, there's really no information in their sights, bands, reference bands and section, I guess I don't know.

  • Um, so yeah, clearly we get chunks of text, and then here we definitely do have sight span stuff.

  • I'm not really sure what this is.

  • This is that, like, you know, character placement.

  • I've no idea.

  • No clue, but clearly, we're looking for the chunks of text inside of body text.

  • Okay.

  • Okay.

  • So I think we have a general idea.

  • Now that's about as much as I want to pour over the actual Jason files.

  • I think we're ready to begin coding.

  • So if you haven't fully downloaded the data yet, now would be a good time to pause.

  • Um, so the first thing I'm gonna do is, uh, make a file.

  • So, you know, rod up high, and we'll just a start, and, um, you go open with sublime.

  • I need to set Sublime is my default later.

  • So, uh, right away.

  • Let's just start.

  • It will take time to go through like each of these directories.

  • So let's just start with this top directory, so I'm just gonna grab the name for now.

  • Later.

  • We'll go through all four, probably so we can get all of the data possible.

  • But we're gonna say ders equals and we'll just make this a list.

  • There we go.

  • And then immediately, what we want to do is for d endures.

  • Um, we want to.

  • Let's go ahead import s way.

  • Let's just make sure we get all the files.

  • So the first thing we wanna do is like, you know your way through all the files.

  • So Ferdie Anders and then basically way want to do for file in os dot list dir.

  • And when you use that strings here, apparently because, um, you know, some people have said that it won't like for its last one work on windows, But from my history of working on windows for many years, I have found that even if strings on windows did work with a forward slash So someone police feel free to correct me if I'm wrong.

  • Like if you're on Windows and that's not working.

  • And you're on, like, python 37 or whatever.

  • Um, yeah, I guess.

  • Let me know.

  • But it should work.

  • Oh.

  • Oh, that's not getting a boon to is having a problem.

  • Uh, hopefully we're still recording.

  • Looks good.

  • We'll find out.

  • Cool.

  • So we know that it was the directory slash directory.

  • So then we just want to go through the files.

  • So the first thing I'm gonna dio is Let's just go to print file and let's just see.

  • I also don't have everything correctly set up yet on this machine is a relatively new machine for me, so I'm gonna run everything from the terminal.

  • We should make sure python gets a python for me.

  • Points to 3.7 Pipe on file.

  • Cool.

  • Look at all those.

  • Okay.

  • So we are pointing were able to find our Jason files.

  • Very good.

  • The next order of business is to open one.

  • Um, So let's go ahead and in court, Jase.

  • And what we want to do is a final path.

  • Will be, um I guess it would just be this We're here, so just copy.

  • Paste slash, Um, file cool s.

  • So then what would say j equals jayson dot loaded.

  • And we wanna load open file path with the intention to or a B.

  • Cool.

  • Now, let's prove J let us issue a break so we don't get too crazy, But we just want to make sure things were working as expected up to this point.

  • And it they are.

  • So, uh, so we got Jason.

  • Now, let's do a quick for key in J uh, Frankie, let's just get a general idea of the keys in this documents.

  • So we got paper I d meta data, abstract body text, and so on.

  • So let us now, uh, what's going on in print?

  • A couple of things.

  • Let's prance.

  • Uh, j meta data.

  • We already kind of I think I know what body text is already gonna look like that kind of looks pretty clear to me.

  • Metadata appears to be quite a bit of things, so you'll get the title.

  • This apparently has no title authors.

  • Um, gosh, I almost want Okay, let's do four K in J metadata.

  • It's almost would be nicer.

  • And a notebook format.

  • I didn't really plan for that.

  • Uh, okay, so really, just title authors really title, But that looks that's a lot of stuff for there to just be titling authors, but Okay, sure.

  • Okay, so we're gonna say is title equals J meta data title on then?

  • We also have abstract was equal to J abstract.

  • Uh, and then the next thing that we really want to do is, um, let's go before, let's print abstract real quick print abstract.

  • I need to leave this up.

  • I'm gonna close these two because that's gonna be annoying to keep doing that every time.

  • Oh, interesting.

  • So apt.

  • Abstract appears to be in a list format, which is odd.

  • I'm not really sure what to think about abstract being a list, but would there ever be too abs?

  • I mean, I'm not a professional lent papers, but, uh um, having written all of zero papers in my life.

  • But even in college, I'm pretty sure there was only one abstract, but okay, so, interestingly, abstract zero.

  • So the abstract needs to be abstract zero, I guess, Um, let's stop breaking and stop printing and let's run through everything to just see No.

  • Okay, so sometimes we don't have, um, Nash.

  • Rex.

  • Let's try abstract.

  • Except we're just trying.

  • I'm trying to see.

  • Is there ever a time where it's not a list?

  • Because it does feel weird, But shortly like my expectations.

  • This came from a Jason based database.

  • So my expectation is that everything will have the same format, but I could be totally wrong.

  • J abstract.

  • So my thought here is that this will be an empty list when we hit that exception.

  • Um, that's the only thing we're gonna print.

  • So hopefully we'll see a bunch of empty lists.

  • Yeah.

  • Cool.

  • Okay, So accept.

  • So in the case that there isn't an abstraction, We're just gonna say it are abstract.

  • Where's the abstract?

  • Is empty.

  • Just empty strings.

  • Okay, so all right.

  • So we've got the abstract, the title, and now we need to grab our body text.

  • Um So we're going to say full text equals and empty string for now.

  • And then four text in.

  • Uh, would it J uh body text, body text Four techs in J text.

  • Let's continue our break, because now we're going to just just try to see you, like, are we getting what we expected here?

  • Print for text.

  • So we need to say text text?

  • I guess so.

  • We call text protects and J by text text texting cake trip up.

  • Shit.

  • What happened there?

  • I said shoot, By the way.

  • Um okay, Yeah, that looks more like what we expect.

  • So, um text.

  • So now what we want to do is we're gonna upend this toe Full text, just wants a full text plus equals text text, plus new line.

  • New line.

  • Beautiful.

  • At the very end.

  • Let's print full text and let's stop ringing there and let me see now.

  • Well, we look at okay.

  • Yeah.

  • Okay.

  • So now we Finally after what?

  • Has it been 13 minutes?

  • 14 minutes.

  • We've got the data in the format that we expected to get the data.

  • But again, this is more likely to be the case of how even text data if you're going to get it from like, for example, I've had, um, even retailers and then certain news companies and try to think of what else?

  • I've seen something similar to this, but, uh, it is data typically in a format that you did not expect.

  • It's either CSB and or Jason and not just like text documents.

  • You might think, um, or even crazier, I've seen quite a few things.

  • I've seen some stuff.

  • Y'all Uh Okay, so Okay, so we've got things organized to an extent, and then I guess the next thing I would do is organize it into, um I guess it depends on where our school is.

  • So So we've gone this far, and now our job is to extract something meaningful from the data.

  • So we've kind of structured it to some extent, or at least we're close to being able to structure it.

  • So now, coming back here and going to tasks, you know, I'm looking through this list, and I thought about it a little bit initially, but wth e I think you know the problem with with extracting meaning from Texas is you wanna start with the lowest hanging fruit first.

  • And in this case, we're trying to extract from text.

  • This is so heavily in NLP or natural language processing type of task that in order for us to mind this data, we need to we need to know what we're looking for, right?

  • If you don't know what you're looking for, you can't find it.

  • So the first thing is, how do we find any of these things?

  • So, for example, many of these terms, you know, you might think, Well, you could just search for pharmaceutical, non pharmaceutical intervention, But many times the idea or some, you know, somebody talking about a non pharmaceutical intervention, they're going to say it in such a way that is not non pharmaceutical intervention.

  • They're gonna call it something else, right?

  • And they're gonna describe it in such a way.

  • Now, one thing is like like a vaccine, the term vaccine will probably always be called vaccine.

  • Okay, um, another one would be like antiviral, right?

  • So anti viral is probably always going to be called an anti viral.

  • Okay?

  • Similarly looking appear like transmission, incubation and environmental stability.

  • Some of these air me personally, I'm more curious about things like you know how is it really transmitted?

  • Are we talking?

  • Are we talking three feet, six feet?

  • How long does it live on surfaces.

  • But then, even like you look at surfaces, let's say environmental stability.

  • You got questions of, like hard surfaces, porous surfaces like clothing and so on in liquid form or liquids.

  • Rather what?

  • It gets very complicated really quick.

  • So what we're trying to figure out is something that does not hopefully get too complicated too fast.

  • Figure that out on.

  • Then we can start trying to tackle some of these more challenging thing.

  • So one together terms that I think is highly unlikely to be changing in in a t least scholarly journals about this is incubation.

  • The term incubation.

  • My tag in the short keeps flipping up in its driving nuts.

  • The term incubation is likely to always be called incubation in a scholarly journal or text or whatever in research.

  • So my intuition or expectation is that we can use incubation as a starting point because we can search this document for incubation and hopefully very close to where the word incubation is used.

  • We can search for a duration, right, so numbers.

  • So my expectation again is that we could probably do a really basic regular expression on, and then later we could really ramp up that regular expression to find many more examples and then hopefully filter out any mistakes.

  • But chances are numbers around the term incubation.

  • My guess is that these numbers are going to be incubation times in the form of ours or probably most likely days.

  • So again, with a lot of these things were looking at so many variables, like incubation with depending on what we're looking at, we're looking at something that could be like minutes, hours, you know, days, weeks, months.

  • Like who knows, Right, So So, you know, incubation is a general term, but in this case, I think we could probably search for digits in days.

  • And that's my expectation.

  • So that's what I'm going to approach with status at first.

  • So let's begin.

  • Okay, so we're gonna look for incubation the way that we can Look, there's a couple of different things that we could do.

  • One option we really do have is from this point and from full text right here in line.

  • You know, um I swear him my tab caps lock.

  • Anyway, we could begin searching right now.

  • Um, even here for text.

  • Like we could just search in this loop.

  • But not all tasks will be that easy for us.

  • I don't think so.

  • Instead, I'd like to do is build a data frame.

  • So first, let's just do, uh, docks.

  • I'm gonna make that list to come up here.

  • I'm gonna import import pandas p d suggested.

  • And I just wanna, uh the end.

  • We're gonna say here, Doc's got a pen and they were going to upend a list.

  • And that will be tight title title.

  • Um, abstract.

  • I guess it doesn't really matter the order, I suppose.

  • And then full text.

  • Okay.

  • Comment that out.

  • Also, just so we know where we are, I'm going to from TD am import todo.

  • And this is just a nice way to make a progress bar if you want.

  • If you don't have this, just install TDM.

  • Something with pandas pits all pandas.

  • Um, four file in.

  • So one thing I want to do is let's go and print D and then here We're going to teach you d m around this os.

  • Doubtless dir was Eventually, we'll have multiple directories.

  • Um, cool.

  • So that will stop the brick.

  • And let's just run that first, See where we stand.

  • And, um yeah, we'll go from there.

  • Okay, that's quite fast.

  • No one quick.

  • I was quicker than I thought.

  • OK, so now we can do is we've import import pans and Speedy.

  • So now we're gonna say is D f equals Petey Got data frame frame title cased on.

  • And, um, we really could just convert dogs on, and then we're gonna give it the columns.

  • And this is just a list of column names, so we're going to say literally title abstract and then full text.

  • Cool.

  • So now we have our data frame.

  • And just to make sure that things are working as expected, it's going out.

  • Put the head just see and things look good.

  • Okay, good.

  • So now what we can actually use is thief filtration methodology that pandas gives us.

  • So one option we have is like, we can make incubation, and we can say that is equal to D F.

  • Where DEA.

  • Full text.

  • Full text.

  • Um dot contains I think this Orrick incubation.

  • Uh, we will find out if that works.

  • Uh, France incubation dot head incubation information was DF where the d f full text contains the string incubation.

  • Let's run.

  • Siri's object has no attribute.

  • Contains has serious Siri's object, so maybe we have to convert it to a strange So it's a dot a spur.

  • Whoa!

  • Uh, did it.

  • Okay, this is going well, uh, okay, so now we have a date.

  • A friend that only content consists of body text.

  • Well, actually whole articles that have somewhere in them something about incubation.

  • We also could have filtered by, like, title or something.

  • But that went so fast, I just don't think it's necessary.

  • Okay, So now that we've done that, I would say we can pull out just the texts at this point later.

  • If you wanted to be able to cite sources or something, you know, we have the information necessary, and just for the record, I mean, at least that was very quick.

  • Now, that was 800 out of 13,004 articles, so things will take 13 times longer at some point.

  • But, um, you know, some, it just depends on the size of data.

  • So in our case, we're dealing with two gigabytes total of data.

  • So we can we go pretty fast and loose with our RND here, whereas historically one thing that I have learned is if you are definitely working with a very large data set, your kind of this is what I would describe.

  • It's kind of a pre processing, data, pre processing step.

  • It is wise to save everything.

  • So in this case, we're just not safe and very much when I'm not going to hard here because we can generate very quickly.

  • But normally, my best suggestion ever is it like you're dealing with, like a terabyte of information, uh, in your pre processing step, save as much as you can because, like in the data frame, for example, we can at any point we can output this date a friend, so that so, like the time it takes to do like any sort of logic and stuff, you only need to necessarily do that one time.

  • If it's a big data set in this case, it's really not that big.

  • So we can kind of goof off and do something to some extent.

  • Okay, so now we're going to say is, um well, say texts, Texans, texts equals incubation, body text dot values.

  • And now we can begin to generate over texts.

  • So 40 intense print t book and come over here.

  • Um, yeah, I guess we can just play it.

  • By the time I figure out how exactly I want to run it, it will be too late anyways.

  • Body.

  • Oh, what is the issue?

  • Full text.

  • Not beautiful noises that cut makes.

  • So I hope it captured that beautiful a SMR for you guys.

  • Uh, full text.

  • Let's try that again and see if that works good.

  • Okay, So how are we gonna do what we think we want to do?

  • So I, um I think probably a valid method here is gonna be to just simply split by space.

  • I'm not space.

  • Um, period.

  • So I think we can get away with splitting everything by a period here and then look in that exact sentence for incubation.

  • And if we find incubation in that sentence, look for a duration, which is digits.

  • So, um, and there will be many problems with that.

  • Hopefully I'll remember to address them, but obviously, some of these things are gonna probably do things like compare incubation times.

  • Or it might be, um, it might not even be a comparison like maybe maybe in one sentence.

  • It says, Here's the, uh, here's one incubation time and then it's like, compared.

  • And then in the next sentence, it's like, Well, compared to the ones that whole.

  • It's a completely different even sentence.

  • So there are many kind of Gutches that you might find in this case that you're gonna have to eventually probably figure out some way of detecting when that's the case.

  • What I would do is I would search for related diseases that you expect could be compared to and then see, what is that being talked around about anywhere around where we're about to pull an incubation time and if it is, forget about it.

  • But for now, we're gonna keep things very simple.

  • We're going in right over texts.

  • And then what we're gonna say is, um, for sentence in t dot split by periods space.

  • So we don't split by like decimals, Right?

  • Um, let us drink.

  • Let's print print sentence.

  • But in fact, brew, let's say if yeah, if incubation in a sentence, print the sentence, Um and actually rather than breaking Let's pretty few and just kind of see what we're dealing with.

  • So what us run it.

  • But you don't split.

  • I'm a little confused how we're seeing.

  • How are we seeing?

  • That looks like a lot more text than I would expect to be seeing.

  • Per split for sentence 13 texts.

  • T was the full text, right?

  • Yeah.

  • For sentence in t dot split by, huh?

  • Dogs in Bergen.

  • Hm?

  • Two news.

  • Why are we seeing that?

  • That is odd.

  • I would not expect to.

  • I have to pause from a dog's going crazy.

  • This is bad timing, because what I don't understand is okay.

  • Let's go figure out what my dogs were going on about.

  • I'll be back.

  • Okay?

  • To be honest, I have no idea.

  • I think we maybe got a package.

  • I'm not really sure, but, uh, don't know.

  • Anyway, continuing along Yeah, we're still get this, like, full text, uh, converting them to values dot I believe that's what I want values.

  • Um, I'm just a little confused if incubation in sentence sentence t dot Split teen texts.

  • Um, I am slightly confused.

  • So let's print, uh, Len texts.

  • I just I'm expecting to not see.

  • Um, you know, we've got many example like this.

  • Should have been split.

  • Why was that not split?

  • I don't know.

  • Um, I'm obviously missing something unbelievably obvious here.

  • Text for tea and text print.

  • T That's our Oh, my gosh.

  • Was I always printing t?

  • Are you kidding me?

  • This isn't This is really it is that I can't even remember.

  • Now if that was their it's terrible.

  • If that's really it, I'm gonna be Oh, dear.

  • Oh, I guess I'll leave that in.

  • But that's that's unfortunate.

  • It's unfortunate.

  • Okay, so, dude, that's just embarrassing.

  • Anyway, um okay, so we have incubation time, blah, blah, blah.

  • We assumed the incubation time could not exceed 30 days.

  • Okay.

  • Very interesting.

  • Um, 48 hours.

  • So we're gonna find stuff like that?

  • I don't know.

  • That's not very good.

  • The average incubation period was seven days.

  • OK, It's like that's an example of something we're looking for.

  • Estimated incubation, 5.2 days.

  • Okay, so that's another example that we would look for the this article selects the incubation period as seven days.

  • OK, so you get the idea.

  • I think we can look for, um really just a start up and look for examples of, uh, digit, some digit space dates, right?

  • We can do that very simply.

  • So, uh and then if we find I would also say we will only accept examples where we find incubation and then one digit space days example and then the sentence has to be done.

  • So, for example, and the reason why I want to do that is like here, so we say, OK, the incubation period of seven days blow.

  • They keep talking Yak yak in latent persons seven days ago.

  • Now this could be this could be anything, right?

  • It doesn't necessarily like this could be some other number of days.

  • I think this is still in reference to why it's seven days, but, um, you might have scenarios where you have these, like, two variations.

  • So if there's more than if we you know, if our logic is hey, this is how we're gonna find incubation times and then we find two digits.

  • Now, in some cases, it could be a digit incubation time of 7 to 10 days or 7-10 days, my friend.

  • Scenarios like that, but we might also find some areas like this where it's not, they aren't related to each other.

  • So if we find that scenario, we find more than two instances.

  • Um, that our logic is not solid, and we're just gonna toss it.

  • So So let's just start with something super basic, because that's what we need.

  • I can't even figure out how to split text and not lose it.

  • Okay, if incubation in sentence print sentence cool.

  • We're gonna use a regular expression.

  • So one say works in court, are you?

  • Don't worry.

  • I'm no regular expression expert.

  • As I'm sure you will soon see, um, what's a single day on?

  • We're going to say that is equal to a readout.

  • Find all and its pattern and then the thing.

  • So we're going to look for some sort of regular expression in the sentence.

  • So, um, so the first single day expression that we are gonna hunt for is going to be some digit.

  • That is really wanted to.

  • So, like one.

  • You know, either single digits or double digits, followed by a space followed by at least d A.

  • Why?

  • Because sometimes it will be ours and stuff such So we definitely need to um, be looking very specifically for day.

  • So single day redefined all over blow.

  • Okay, So if, uh if Len ah, single day equals one, then we're assuming this is a good fine.

  • So let's go ahead and let's go ahead and print a single day zero, and let's print the sentence on.

  • Then we'll tab over these just for a nice, beautiful formatting.

  • Uh, and then no, that's fine.

  • We will see a few examples.

  • What is going on?

  • Okay, so is a two day, seven day.

  • How come over printing this sentence here.

  • Why do I keep getting bitten By forgetting old prints?

  • I really don't appreciate.

  • Ah, beautiful 89 day.

  • No may suggest Where's 89?

  • Even coming.

  • Oh, my gosh.

  • You should we say.

  • But I'll get Here's a question.

  • Why did it find 0.89?

  • Oh, because then it was followed by check.

  • Okay, so what, we're gonna say spa.

  • Ssh.

  • Take that.

  • Let's try that one.

  • We have 89 days.

  • I got you.

  • He was 21 is 21 susceptible compartment.

  • Return return.

  • It's a symptom.

  • Okay.

  • Oh, the mat.

  • So they do think 20 ones is the maximum maximum incubation.

  • We've got 14 day attn least 14 days, and we stress that 17 to 14 days here.

  • So possible.

  • Okay, Okay.

  • Okay.

  • Cool.

  • So Okay, that's a good find.

  • As you can see, we did see various things, like 2 to 14 days.

  • Also, how come you didn't find the two and then throw this away?

  • I don't really know.

  • 14 day.

  • Oh, because it was 14 day.

  • Copy that.

  • Okay, Like I said, regular expression Expert s O.

  • So, as you can see already, like, we would probably want to search this and this and then also search for 2 to 14 days to dash 14.

  • So many things that we want to search for it.

  • We'll keep it nice and simple for now, however.

  • So if we find a single day, what do we want to do from there?

  • The other thing I bet we're missing is like, you know, these.

  • These are keeping it two whole numbers, like seven.

  • But chances are a lot of articles have decibels as well.

  • Um, I don't think I'd try my hand at a possible decimal.

  • Let's Let's try.

  • Let's do it.

  • Let's show the people how terrible over Programmer I can be at times.

  • So let's say we wanted to have a possible decimal.

  • That means you would have a slash day digit rather fault, you know, wanted to wanted to followed by a period.

  • And this whole thing, we would have zero or one occurrence off and then another digit.

  • Right?

  • We'll see what happens.

  • Let's see, five dot Okay, so that's problem.

  • Why?

  • Didn't find 52 bro.

  • Isn't that can tie?

  • In case that I thought I could get it.

  • I thought I could encase it with, um, parentheses.

  • Is it brackets?

  • Maybe.

  • I wish I could remember, Uh, because we really want all that together.

  • Otherwise, we would get up to four digits and we don't want four digits.

  • We want that.

  • Is it?

  • Maybe the parentheses.

  • What?

  • I screwed up, maybe.

  • Is it a bracket?

  • You already?

  • Congratulations.

  • Already seeing how terrible I can be at times?

  • No, I don't think it bracket is what we want.

  • Someone comment below remind me the basics of regular expressions because I want this whole thing.

  • Possibly.

  • But for now, let's keep it simple.

  • Like I said, Yeah.

  • You got to see how terrible I can be because we want those two things, possibly And I just don't know.

  • There's gotta be some way I thought it was with parentheses, but I think I'm wrong anyway, as we saw because you'll get like, that leading number, but not the following number.

  • And I think that's because the parentheses, like, picks that part to find.

  • Like, I think the parentheses means you're gonna find.

  • You'll find what's in the parentheses and this stuff still has to be found.

  • But you'll only find, like it's only gonna return What's in the princes when instead, Yeah, I want that group.

  • I just don't know.

  • I'm I'm not.

  • I'm not good enough.

  • My apologies, Someone combat one below on.

  • We'll get it.

  • Somebody will do it.

  • Um, okay, so we have single day.

  • Great, eh?

  • So now we want to do is let's say incubation times.

  • We'll make that a list on and then we will upend.

  • See how far we are in time.

  • Here.

  • Another 10 minutes.

  • I think I stopped at 25 because my lovely animals, um, incubation times.

  • So what we're gonna do is if we do find it, I think I was pretty happy with what you know, this regular expression did for us.

  • So, um, just make sure I didn't screw up when I removed our other example.

  • Not?

  • Looks good.

  • Um, we will just Adam to this list.

  • So I will say is incubation times dot of Penn and, well, these air it's.

  • But later we might find flicks given somebody who knows how to write regular expressions Better than what?

  • Well, say float single day zero.

  • So, again, you should all be actual imagers at this stage.

  • But I think eventually you'd wanna have the possibility for a float.

  • Um, the other thing we could use, like if I was doing this for a client, you know, and I really want to know the answer to that question.

  • Um, and I could not figure out how to write a frickin regular expression.

  • I would legit, just single day in a single day float.

  • That's the kind of programmer I am.

  • But for now, we'll just do it this way.

  • Um and, uh, cool.

  • We don't need to print out anymore.

  • That's kind of pointless.

  • Very good, very good.

  • And at the very end, let's print, let's put both incubation times, and I'm just curious.

  • What is the land of our land of our incubation times?

  • Beautiful.

  • Save.

  • Rerun this.

  • Where?

  • Where did we forget?

  • A, uh, really?

  • Show me again.

  • Line 62.

  • Oh, okay.

  • We never closed off that parentheses again.

  • Could not, of course.

  • Off course.

  • Um, upend float.

  • Let's split by space.

  • We'll just say, um, numb equals a single day.

  • Zero.

  • So we're basically trying Thio Just grab the number from and I actually think you could split by day.

  • I think you can convert Spaces And a number two.

  • Just a number.

  • I think you'd get away with that, but we could split by space.

  • Let's do a split by space print.

  • Numb break.

  • So in every instance, it is the first development, as I was hoping, but I just want to make sure eso numb one.

  • Fantastic.

  • Just clean this up a little bit.

  • Okay?

  • So already we have 71 examples of incubation times, and it looks like our largest is like 42.

  • Maybe.

  • Yeah, it looks like that's the biggest one.

  • So then what we could do is slowly begin to possibly wrap this one up import.

  • Um, it was the Emperor Matt plot Matt Port that lived up.

  • I plot as peel tea.

  • Uh, and then there's gonna have style from Matt Plot live.

  • We're gonna import some style going Use some style and we will come down and we will say, peel tea, not hissed and remind me.

  • Yeah, it was okay.

  • So the array and then the bins So it will be incubation times and then we'll do bins.

  • 10.

  • Um, p lt don't show.

  • Um, the y axes peel tea.

  • Don't.

  • Why was labeled?

  • Yeah, Beautiful.

  • Um, we're gonna say counts, I guess.

  • Been counts.

  • Uh, p lt dot x label Will be incubation time days.

  • Beautiful.

  • Beautiful.

  • We're coming down the home stretch.

  • There we go.

  • We got a nice hissed a gram of projected incubation time again, we probably caught some things that we should not have caught here, so we still have a lot of work to do.

  • But I think I'll show one more thing.

  • And that is Well, a couple things One.

  • Do we have no pledges?

  • Quickly.

  • Important.

  • Um, pie as np and can't get away with, uh, France.

  • Huh?

  • Mean projected incubation time.

  • Iss.

  • What is it?

  • Empty dot mean commissioners, cube ation times.

  • Let's run that.

  • Okay.

  • About 10 days.

  • We'll add days to that as well.

  • So yeah, that is the projected kind of average.

  • Okay, So possibly extracted some meaning there.

  • Um, days.

  • Okay.

  • Finally, we only went through that one set of files.

  • So let us add the others and with her organize the same way.

  • Well, we're gonna find out.

  • Com, you subset, Let's grab the noncommercial use upset.

  • And finally, we will ground custom license com P pasta save.

  • Let's try again.

  • Hopefully, that doesn't take forever.

  • It looks like it very well might take refer.

  • How come those 1st 800 goes so fast?

  • And then this one goes so slow?

  • Well, the good news is, I'm already editing this, so maybe I'll just kind of speed it up on making cut because I already have to edit this because of my lovely animals.

  • Aren't they sweet while we wait on that?

  • Let me think if there's anything else I really need to talk about, I'll let it this whole thing out.

  • If if there isn't, um, yeah, yeah, I guess one thing I would do is maybe come check out some of these kernels to be honest.

  • Um, like, I don't know what a lot of these things are.

  • So understanding paper with text analytics like, there's probably a lot of really great ideas.

  • Here's even a rules.

  • Incubation period.

  • Let's see what they've done.

  • It looks like they've gone a little harder on on making their matches.

  • Let's see what they found.

  • What did you find into match?

  • Yeah, So that it looks like he built, like, something really specific term match.

  • Our interval match.

  • Your period manager.

  • Interesting.

  • Interesting.

  • What'd you find?

  • Looks like this is an incubation period.

  • Um Well, here's ours.

  • How much data was that?

  • Let's go look quick.

  • So we got 562 with the mean projected incubation time of 9.11 And what was this?

  • Tendons?

  • I think it was 10 bins.

  • One thing we definitely should have done it saved this array because it takes so long to create that rate or list.

  • Rather Whoops.

  • Because how many Ben's did you go with, bro?

  • What did he do?

  • Pl Tito Oh, he just made a bar graph his lots of bins.

  • Uh, anyway, Okay.

  • So, um yeah, so we can see you know, on average, somewhere between zero and 10 days is like the most.

  • But then, if you were to, like, stack up all these bars here, you know, get to, like, 18 or whatever this number would be, it would be pretty high.

  • So summer up till, like, 20 days looks to be not a shocking number.

  • Now, of course I would.

  • The next one of the next things I would do is try to figure out what's going on here.

  • Um, why?

  • Is there something there?

  • Uh, also, Let's, uh, even though we didn't save it as an object, we can cheat.

  • Um, like, I wonder if we give it more Ben's.

  • I do want to kind of see a few more bins.

  • So what?

  • D'oh.

  • Come on down here and I'm going to say incubation times equals pasta.

  • I'm proud of myself for just copying it.

  • Uh, let's changes to 30.

  • Ben's safe.

  • Uh, one more attempt.

  • Okay, so we make more bins.

  • Okay.

  • So we can clearly see like these.

  • I might even like it as I look further and try to fix this program, I would look for each of these and see, um What What did we get that gave us these?

  • Because I don't think those are real incubation times.

  • So we wanna figure out where those came from to determine how can we better improve our script, But yeah, it looks like somewhere between five and five, and I don't know, it was that 75 and six days prior.

  • Five and seven.

  • Was Was the mouse a seven?

  • Um, looks like the incubation time is probably 5 to 7 days on expected average.

  • But then, you know, like I say, looking before if we took this one stacked this on top of their respective it clearly more than 10 maybe even up to two weeks.

  • Yeah.

  • So anyway, Okay, So some starting information, I'm sure we made lots of, um, you know, incorrect polls.

  • We are missing a lot of data because I'm sure many days were using decimals, and I just I'm just not smart enough to touch into making expressions.

  • Apparently, um, I honestly just have not done regular expressions.

  • And it might even be years now, like I just haven't had to deal with them.

  • I usually get Daniel, right?

  • So anyway, um, yeah.

  • So, so interesting.

  • Initial insights we could keep going with this and keep trying to find more examples and stuff.

  • I'm sure there's much more incubation stuff in there that we could pull from.

  • But then, using kind of a similar methodology, we could start toe.

  • Be curious about some of the other tasks that are on here.

  • Maybe you've got an idea of something you want to look for.

  • Or maybe you want to do something totally separate from what I've done here.

  • But on the data set anyway, it's a cool data set, and it's obviously an important data set on.

  • I think it's a realistic said.

  • I mean, it's Israel's gets This is a real problem that we're actually experiencing right now, Uh, and I mean just just the data set in general, this is a data mining problem.

  • Um, so anyway, I think that's all if you've got questions, comments, suggestions, concerns, whatever, feel free to leave him below.

  • If you've got something cool you can feel for the Lincoln below, check it out.

  • Also, usually on a casual.

  • Like I said, like, I would look through some of the colonel's, maybe participate some of the discussions and stuff and probably learned some really interesting things, so Yep, that's all for now.

  • Hope you guys are staying safe and I will see you guys in another video.

what's going on.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it