Placeholder Image

Subtitles section Play video

  • Traditionally where people have used compute in their their research

  • They know what to do because they've they're experts in the field and a lot of our classical users using

  • HPC know exactly what they want they know how to process their data

  • They know they need to use scientific Linux

  • They can write software to process their data, and they can get their own research questions answered by themselves

  • and you know part of information services job is simply just to provide the hardware in the grunt that they need to do that but

  • Increasingly we're finding that other people are wanting to use compute in their research

  • But they're not really knowing how they haven't got those skills so part of my role is to sort of

  • link up their research problem with the skills needs the software and the analysis tools that they need to you know answer their questions I

  • Was a medicinal chemist so that's a type of synthetic chemist who designs and makes new drugs, so

  • All the different processes through you know designing a small molecule for a particular drug target

  • Doing some computational docking using the hpc to find out of the million possibilities

  • Which ten should I make and then test to see which might be the cure for cancer?

  • asthma

  • Alzheimer's disease or just the new painkiller I

  • Started off my training doing a PhD in chemistry just the simple. How do you make this compound what reagents to use?

  • How do you make it but as I sort of progressed in my career moved into the drug discovery space?

  • Where the chemistry's just applied as the technique? That's just one of the tools and the the interesting and the intelligent thing is

  • This is the drug target. What should I make to interfere with that target to have that biological effect?

  • And my interest really looked it moved into sort of the using computers to answer those kind of questions

  • So what do we make why do we make it? Where should we direct our efforts?

  • Traditionally in all style research it would just be somebody sat in an office thinking about a particular problem, and then proposing an answer

  • But then you know that as a scope and possibilities. They increase you know we've got more data available

  • We've got more possibilities available it really expands beyond what one person can hold in their head so as research is

  • interdisciplinary and we use chemists biologists engineers mathematicians

  • So that you know in a traditional submit' like wet organic chemistry we now need to use computers to help analyze

  • possibilities data and questions, so yeah kind of expanding that

  • Research space into using computers is really sort of becoming increasingly more important

  • how I got involved in HP pcs and computing was doing something called docking so if we have a

  • Small molecule of a drug molecule say like aspirin or salbutamol though, we think might be a good molecule in a particular drug target

  • What we can do is use the computer and ask it a question saying does this small drug molecule fit into

  • This receptor protein how good a fit is it?

  • Where does it fit in what shape is it when it fits so what we can do is use specialist software

  • packages to ask that question of

  • hundreds of thousands or even millions of small molecules so you will prepare a question

  • How well do these five million drug molecules fit into this receptor with the question with the software?

  • I'll submit that question to the HPC queue, and the that's where the HPC takes over and say okay, this guy's got this question

  • That's broken down into actually 25 million sub questions

  • And it's the HPC scheduler that then splits that job up into separate Nords and distributing out to different processor cores

  • So I could have 2,000 processor cores

  • One working on each drug molecule into that separate receptor when it's finished with that one

  • It will tell this and a master controller that it's done

  • and it will be allocated the next one to do so it's kind of like that the HPC is acting as my research assistant and

  • answering all those millions and millions of questions for me while I'm in the lab making - or

  • Having a cup of coffee or lunch

  • Or chatting to the boss about what the next question is that particular question sounds like a very very complicated

  • Puzzle really isn't it and it sounds like a really complicated puzzle

  • But that the techniques and the software tools for asking that question. How well does that molecule fit into there a very mature?

  • It's very well known. It's very well understood a subject the problem is really the scale and I

  • Can't have enough computing power to dock every single possible molecule into every single drug target

  • It's got to be an intelligent choice, but as computing power is ever increasing

  • You know Moore's law more processors more memory

  • I can ask more of the computer get it to tell me more information so I can concentrate on the the chemistry specific knowledge

  • I'm imagining

  • Researchers from across the university well certainly across the world wanting to to use computers to do these to answer these questions

  • Is there one science that uses it more than others or?

  • No, I don't think there is really I mean. I think traditionally users have come from physics astronomy chemistry

  • Engineering in particular, but also increasingly biology genomics researchers the life sciences

  • And we're now seeing people from the social sciences humanity and even arts coming in to start using

  • Computers in their research and are these high-performance computing systems

  • Or clusters are becoming the norm are people expecting these as part of their research now absolutely

  • Yes, you know again

  • Going back years what we need to provide to an academic

  • Would be an office a desk a green board and some chalk now people really expect and you know that high performance

  • Computing facilities are available to them and the university does that by providing HPC facilities like we're just looking at today

  • But also renting them from cloud providers like Amazon and Azure

  • Microsoft and Google and others as well thinking of those leaders of the field Google and Amazon cloud computing etc

  • Why would you do it yourself if all those options are available was a number of reasons we might want to do it ourselves

  • First and foremost those companies are really good at it

  • But most companies are really good at it because they charge for doing it

  • So there is a higher cost associated with renting somebody else's kit to do it

  • There are times when we need to do it on-site for security reasons

  • So if we're working on some very sensitive research material, maybe something in the medical field with patient data

  • We'll need to guarantee to the funders and who owns that data

  • It's very safe and secure by keeping it on-site

  • Equally we might be doing something say with genomics research where the quantity of data is so vast in you know

  • Terabytes of data per hour that we need to analyze it and process it here on site

  • And that the costs and the time associated with shipping that data somewhere else

  • to process it and answer those research questions

  • And then bring the data back again is is prohibitive equal at those times where we're using remote data. Say satellite imagery or

  • data from the Human Genome Project

  • Well that data already exists off in the cloud

  • So it makes far more sense there for our researchers to take the question to the data

  • And analyze that data off-site so kind of by offering both

  • We can hopefully give researchers the ability to choose this one's better for me

  • or this one's better for me does it ever go horribly wrong in that you ask a question and you've made a mistake and

  • Wasted hours and hours or days of HPC - absolutely, it's happened to me more times than I care to mention

  • One time in particular I was doing those docking small molecules as a drug mark

  • Targets only a couple of hundred thousand there was some sort of software error it started churning out an ever-increasing error file

  • size of the error file went through 300 gigabytes

  • Blocks the entire system everybody else's jobs failed 20 or so people quite angry with me that I just killed all of their research

  • But that kind of thing does happen in research

  • and I guess that's the that's the computational equivalent of the professor having an explosion in the lab and

  • Spraying stuff all over the room which I've also done

  • Maybe I'm in the physics lab working with Professor Moriarty

  • And I want to do some computing time how much it gonna cost me to use that kit

  • That's a good question at the minute

  • We don't actually charge our researchers directly for using an HPC facility

  • So there's no per hour

  • charge for using a

  • Computer core you do know this is going to be available for them to all watch and then they're all going to start

  • clamoring after this absolutely you know and

  • If more people need more resources and we can provide them we can we can look to work for them what we want our research

  • To be ambitious and to try and push the boundaries, and we can't do that by restricting

  • Unnecessary access to kit if money doesn't really come into it in that way how do you decide who does what and are there fights?

  • That erupt as to who needs to compute power more than who else there are vigorous discussions each time we get a new

  • HPC system and you know you you you can see in there it's a large

  • HPC there are so many processor cores and it can be used for such and such a period of time

  • and there's always a debate about

  • How much of it at any one time is somebody allowed to use and how long can any one person's job go on through so?

  • Very much like Tetris how can I fit these variable width and size blocks in?

  • Is it better for me to use a thousand calls for 10 hours or 100 calls for many thousands of hours?

  • And how do I fit that workload it always causes vigorous discussion few disagreements?

  • There's always morning that the cues far too long, but that's just how it is

  • Where does the book stop in terms of the decision-making is it left at a software world as a human come in and say?

  • Every time we sort of review the process a group of humans will sit sit down and decide okay

  • we think it's fair if

  • Everybody's allowed to run up to a thousand cause for up to four days time and we kind of sit down and make those decisions

  • As a research community, and then the software vigorously enforces those limits for us

  • so you are only allowed up to so many hundreds or thousands and your job is only allowed to run for a

  • Maximum of four days and if it goes over four days it stopped and the next person's given access to that

  • Resource in a way that could be limiting research, right?

  • It's something that the researchers have got to trying to fit into their research plans such as in the same way that

  • Office space and laboratory space can be a limiting factor

  • It's how do I divide my research question up onto the computer am I better?

  • using two thousand calls for a short period of time or is it the kind of question that is

  • Only limited to a thousand cores and needs to run for a longer period of time

  • Do you need to know a fair bit about computing just to make those decisions, though?

  • traditionally yes, I think there has been an expectation you have to know a lot about how computers work to make those decisions, but

  • We are seeing now as the use of computers in research moves into other areas

  • part of my role is to try and help people to

  • Understand how computers think as computers work in a very different way to a researcher

  • they know good at answering the same question for a long period of time in parallel so sort of

  • Changing how you would do it in

  • In person how you would add do some

  • calculations to how a computer would do it lots of different things at the same time it is a

  • Is a step change for some people?

  • The equipment itself is fairly generic. You know that these are standard

  • Blade enclosures the storage is standard storage we have about 240 terabytes in this

  • It's all connected up by a minivan

Traditionally where people have used compute in their their research

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it