Subtitles section Play video Print subtitles Traditionally where people have used compute in their their research They know what to do because they've they're experts in the field and a lot of our classical users using HPC know exactly what they want they know how to process their data They know they need to use scientific Linux They can write software to process their data, and they can get their own research questions answered by themselves and you know part of information services job is simply just to provide the hardware in the grunt that they need to do that but Increasingly we're finding that other people are wanting to use compute in their research But they're not really knowing how they haven't got those skills so part of my role is to sort of link up their research problem with the skills needs the software and the analysis tools that they need to you know answer their questions I Was a medicinal chemist so that's a type of synthetic chemist who designs and makes new drugs, so All the different processes through you know designing a small molecule for a particular drug target Doing some computational docking using the hpc to find out of the million possibilities Which ten should I make and then test to see which might be the cure for cancer? asthma Alzheimer's disease or just the new painkiller I Started off my training doing a PhD in chemistry just the simple. How do you make this compound what reagents to use? How do you make it but as I sort of progressed in my career moved into the drug discovery space? Where the chemistry's just applied as the technique? That's just one of the tools and the the interesting and the intelligent thing is This is the drug target. What should I make to interfere with that target to have that biological effect? And my interest really looked it moved into sort of the using computers to answer those kind of questions So what do we make why do we make it? Where should we direct our efforts? Traditionally in all style research it would just be somebody sat in an office thinking about a particular problem, and then proposing an answer But then you know that as a scope and possibilities. They increase you know we've got more data available We've got more possibilities available it really expands beyond what one person can hold in their head so as research is interdisciplinary and we use chemists biologists engineers mathematicians So that you know in a traditional submit' like wet organic chemistry we now need to use computers to help analyze possibilities data and questions, so yeah kind of expanding that Research space into using computers is really sort of becoming increasingly more important how I got involved in HP pcs and computing was doing something called docking so if we have a Small molecule of a drug molecule say like aspirin or salbutamol though, we think might be a good molecule in a particular drug target What we can do is use the computer and ask it a question saying does this small drug molecule fit into This receptor protein how good a fit is it? Where does it fit in what shape is it when it fits so what we can do is use specialist software packages to ask that question of hundreds of thousands or even millions of small molecules so you will prepare a question How well do these five million drug molecules fit into this receptor with the question with the software? I'll submit that question to the HPC queue, and the that's where the HPC takes over and say okay, this guy's got this question That's broken down into actually 25 million sub questions And it's the HPC scheduler that then splits that job up into separate Nords and distributing out to different processor cores So I could have 2,000 processor cores One working on each drug molecule into that separate receptor when it's finished with that one It will tell this and a master controller that it's done and it will be allocated the next one to do so it's kind of like that the HPC is acting as my research assistant and answering all those millions and millions of questions for me while I'm in the lab making - or Having a cup of coffee or lunch Or chatting to the boss about what the next question is that particular question sounds like a very very complicated Puzzle really isn't it and it sounds like a really complicated puzzle But that the techniques and the software tools for asking that question. How well does that molecule fit into there a very mature? It's very well known. It's very well understood a subject the problem is really the scale and I Can't have enough computing power to dock every single possible molecule into every single drug target It's got to be an intelligent choice, but as computing power is ever increasing You know Moore's law more processors more memory I can ask more of the computer get it to tell me more information so I can concentrate on the the chemistry specific knowledge I'm imagining Researchers from across the university well certainly across the world wanting to to use computers to do these to answer these questions Is there one science that uses it more than others or? No, I don't think there is really I mean. I think traditionally users have come from physics astronomy chemistry Engineering in particular, but also increasingly biology genomics researchers the life sciences And we're now seeing people from the social sciences humanity and even arts coming in to start using Computers in their research and are these high-performance computing systems Or clusters are becoming the norm are people expecting these as part of their research now absolutely Yes, you know again Going back years what we need to provide to an academic Would be an office a desk a green board and some chalk now people really expect and you know that high performance Computing facilities are available to them and the university does that by providing HPC facilities like we're just looking at today But also renting them from cloud providers like Amazon and Azure Microsoft and Google and others as well thinking of those leaders of the field Google and Amazon cloud computing etc Why would you do it yourself if all those options are available was a number of reasons we might want to do it ourselves First and foremost those companies are really good at it But most companies are really good at it because they charge for doing it So there is a higher cost associated with renting somebody else's kit to do it There are times when we need to do it on-site for security reasons So if we're working on some very sensitive research material, maybe something in the medical field with patient data We'll need to guarantee to the funders and who owns that data It's very safe and secure by keeping it on-site Equally we might be doing something say with genomics research where the quantity of data is so vast in you know Terabytes of data per hour that we need to analyze it and process it here on site And that the costs and the time associated with shipping that data somewhere else to process it and answer those research questions And then bring the data back again is is prohibitive equal at those times where we're using remote data. Say satellite imagery or data from the Human Genome Project Well that data already exists off in the cloud So it makes far more sense there for our researchers to take the question to the data And analyze that data off-site so kind of by offering both We can hopefully give researchers the ability to choose this one's better for me or this one's better for me does it ever go horribly wrong in that you ask a question and you've made a mistake and Wasted hours and hours or days of HPC - absolutely, it's happened to me more times than I care to mention One time in particular I was doing those docking small molecules as a drug mark Targets only a couple of hundred thousand there was some sort of software error it started churning out an ever-increasing error file size of the error file went through 300 gigabytes Blocks the entire system everybody else's jobs failed 20 or so people quite angry with me that I just killed all of their research But that kind of thing does happen in research and I guess that's the that's the computational equivalent of the professor having an explosion in the lab and Spraying stuff all over the room which I've also done Maybe I'm in the physics lab working with Professor Moriarty And I want to do some computing time how much it gonna cost me to use that kit That's a good question at the minute We don't actually charge our researchers directly for using an HPC facility So there's no per hour charge for using a Computer core you do know this is going to be available for them to all watch and then they're all going to start clamoring after this absolutely you know and If more people need more resources and we can provide them we can we can look to work for them what we want our research To be ambitious and to try and push the boundaries, and we can't do that by restricting Unnecessary access to kit if money doesn't really come into it in that way how do you decide who does what and are there fights? That erupt as to who needs to compute power more than who else there are vigorous discussions each time we get a new HPC system and you know you you you can see in there it's a large HPC there are so many processor cores and it can be used for such and such a period of time and there's always a debate about How much of it at any one time is somebody allowed to use and how long can any one person's job go on through so? Very much like Tetris how can I fit these variable width and size blocks in? Is it better for me to use a thousand calls for 10 hours or 100 calls for many thousands of hours? And how do I fit that workload it always causes vigorous discussion few disagreements? There's always morning that the cues far too long, but that's just how it is Where does the book stop in terms of the decision-making is it left at a software world as a human come in and say? Every time we sort of review the process a group of humans will sit sit down and decide okay we think it's fair if Everybody's allowed to run up to a thousand cause for up to four days time and we kind of sit down and make those decisions As a research community, and then the software vigorously enforces those limits for us so you are only allowed up to so many hundreds or thousands and your job is only allowed to run for a Maximum of four days and if it goes over four days it stopped and the next person's given access to that Resource in a way that could be limiting research, right? It's something that the researchers have got to trying to fit into their research plans such as in the same way that Office space and laboratory space can be a limiting factor It's how do I divide my research question up onto the computer am I better? using two thousand calls for a short period of time or is it the kind of question that is Only limited to a thousand cores and needs to run for a longer period of time Do you need to know a fair bit about computing just to make those decisions, though? traditionally yes, I think there has been an expectation you have to know a lot about how computers work to make those decisions, but We are seeing now as the use of computers in research moves into other areas part of my role is to try and help people to Understand how computers think as computers work in a very different way to a researcher they know good at answering the same question for a long period of time in parallel so sort of Changing how you would do it in In person how you would add do some calculations to how a computer would do it lots of different things at the same time it is a Is a step change for some people? The equipment itself is fairly generic. You know that these are standard Blade enclosures the storage is standard storage we have about 240 terabytes in this It's all connected up by a minivan
B1 research data drug computing molecule question Research & High Performance Computing - Computerphile 2 0 林宜悉 posted on 2020/03/27 More Share Save Report Video vocabulary