High Performance Computing Clusters for Clinical Therapy: 2015

Tuesday, April 14, 2015

Assembling Computer Part 2 and ARM

Hi Guys!

Part 2 of Assembling the Computer was not really interesting enough to warrant a whole post, but I'll just finish it off quickly for you all.

So the next part was dealing with this mess of wires-- which actually connect the different parts of the computer to the motherboard, where they can be controlled. The wire pins are very small, so it wasn't an easy task to put them in. We had to plug the computer in and try to turn it on, and if it didn't work, go back and move around the tiny wires again.

If anyone ever traps you in a dark alleyway and asks you what the connection wire is called for the hard drive, it's called the SATA!

We also installed the Intel CPU fan. It's really something to think about how much heat these parts generate if you need one giant fan for the CPU alone.

So there's the finished computer! When it works, the motherboard light turns on. Of course, that light is pretty extraneous, so we turned it off when we booted the computer. Still looks cool though.

The MUCH more interesting news is that I think I found a much better solution to the cost issue I raised in my last post! How many of you have heard of ARM processors, or maybe Raspberry Pi computers?

ARM processors are the tiny processors they use in phones, calculators, etc. Unlike Intel Processors, they can be manufactured by any company if they license the building plans from ARM. This makes them significantly cheaper.

ARM processors are built with the main goal of energy efficiency-- they shouldn't get extremely hot and shouldn't require much energy input. This is why they're used in phones. ARM processors deplete the battery much less, so the phone requires less recharging.

Make no mistake- they aren't as powerful as Intel processors. But look at this: while Intel processors like the i7 are $250 alone, you can get an ARM processor (a quad core one!), its motherboard, 1GB of RAM, and an Ethernet connection for $35 dollars. So if I want eight intel nodes, and each node is estimated at $300 (I'm not even including the hard drives), then that would cost $2400. In that money, I can get SIXTY EIGHT ARM COMPUTERS. SIXTY EIGHT.

It's not that Intel doesn't have processors that are as energy efficient as ARM- their atom processor doesn't take up very much heat, but since they have to compete at the price of ARM processors, they don't make very much money off of it. Also, Intel's focus has always been the most computing power, at the expense of energy efficiency. It's very hard for the company to turn everything around and focus on energy efficiency all of a sudden, when companies like ARM have always focused on energy efficiency.

And that, friends, is the solution to my problem. I'll leave you to think about what this huge cost difference means for the future of Intel and the microprocessor industry...and if you want another perspective, this article is pretty good.

Until Next Time,
Anvita

Tuesday, April 7, 2015

Assembling a Computer: Part 1

Hello All,

So this week I actually got my hands dirty and tried assembling a computer myself. Not the supercomputer, of course, only a desktop.

But the assembly of a desktop is very similar to the assembly of a supercomputer node, and it's very helpful to get hands on experience with all the parts.

I also got a much better idea for the cost of the whole thing. This computer costs around $630 dollars- with two terabytes of memory in the hard drive (that's a LOT), built in wireless in the motherboard, and an Intel i5 quadcore processor. Granted- this is better quality than a regular node would need to be, because this computer is going to be used for learning server programming, but still, this is a problem.

The processor is one of the costliest components- $200. So my goal of making 4 nodes in $1000 doesn't look all that feasible if I use only Intel processors, which are the best. AMD has good processors, but they use a lot more heat. In close quarters with four other nodes, that could be a problem. So that's a problem I'm going to have to solve in the next few weeks.

But anyways, I took many, many pictures while assembling:

Here are all the parts- assembling a computer is actually pretty simple! you have the case with all the wires, a hard drive, a processor (blue)...

And my favorite part:

The "Vengeance" memory pack (as you can see, computer science naming still goes along with the world domination theme.)

They look kind of like haircombs, and go into the motherboard as such.

I had to push surprisingly hard to get them in, while the motherboard was half in the air. When I asked why we didn't just keep the motherboard on the cardboard, I learned that pushing the motherboard into the cardboard with so much force would ruin the back of it.

This is the inner circuitry in the hard drive. You have to hold all the computer parts very very carefully, or you can ruin them. I learned to never touch any of the circuitry (anything on a green board). The hard drive goes into the bottom part of the computer with three screws. Less, and the hard drive starts to vibrate and eventually gets destroyed.

The most interesting part to me was putting the processor in. I'd read that processors have a zero insertion force, and had visions of some magical suction type process, where I would hold the processor over the slot and it would be sucked in to the correct position. Alas, this was not the case, but I'll show you how it does work.

The big black square in the middle is the spot for the processor- basically the "brain" of the computer.

You open a little latch on the side, and then you can see the tiny tiny gold plates.


Instead of suction, you have to match a small triangle on the processor with a small triangle on the CPU slot, and then just drop the processor on. If the triangle isn't there (like on ours), you can match by looking at the back.

You then close the latch, and the big black square just pops off. You then screw the motherboard in, and the heart of the computer is done. I didn't believe it when someone told me they could make this in 30 minutes, but it's actually not very complicated.

All that remain are pesky power wires and installing fans. That will be part 2 of this series, and I hope to bring it to you either tomorrow or the day after.

Best,
Anvita

Saturday, March 21, 2015

I'm Back! And Supercomputing Applications

Hello All,

I've been in DC for a lot of the past few weeks (I'm actually there right now!). I've been thinking about some of the potential applications for my supercomputing project, which is important for a later stage of this project-- testing my supercomputer's performance on analyzing medical/biological data. Now, biological data is a very...broad description, so I have to decide what exactly my supercomputer will be used for so I can test it on the appropriate data set.

For the past few years I've been doing a lot of work on computational drug discovery-- specifically, "teaching" the computer through machine learning to identify drug leads. I've been able to reach 91% accuracy with some of the algorithms I've built, based on only four features. This is when the algorithm is trained on approximately 2000 data points. With millions and millions of data points that I could get from integrating various publicly available databases, on a large scale, this accuracy could actually go up quite a bit. Another application where big data analytics are important are personalized medicine. In personalized medicine, you would ideally have the computer go through a person's complete genome and look for regions where mutations are. The computer would then have to learn which medicines work best for a person with a certain combination of mutations. A database with this information is currently not available, but research is progressing rapidly in our area, and our supercomputer should be ready to deal with such a personalized medicine database when it arises.

Our supercomputer should be able to perform more than simple queries on this big data set. True big-data analytics involve finding patterns in the data, even when we have not trained the computer on what specifically to look for. This is called unsupervised learning.

An example of SUPERVISED learning would be showing a computer a set of drugs that are active (and the computer knows they are active) and allowing the computer to learn what the characteristics are of active drugs. The computer can then predict, given the characteristics of a drug it has never seen before, whether that drug will be active. In UNSUPERVISED learning, we would simply give the computer a lot of drugs and their characteristics, and the computer would look for patterns. It might cluster the drugs by the characteristics we have given it, and the active drugs might end up in one cluster and the inactive drugs might end up in another cluster. I know for sure that we would want to test how much time it takes for our supercomputer to cluster the biological data we give it (whether for drug discovery or another application), and test the accuracy of the clustering.

One important principal in design is to always keep in mind the audience. I'm designing this supercomputer, so I need to keep in mind what it will be used for, in order to modify my design accordingly (in terms of both software compatibility and hardware used). So that's what I've been working on so far.

Will report more soon!
Anvita

Saturday, February 28, 2015

Room for Innovation

Hello All,

It's been a pretty busy week for me, but I've managed to finalize the supercomputer design and components. We will be putting the letter trays inside a bigger metal frame (also from IKEA- pictures to come), and each node will essentially be an independent computer of its own, with a laptop hard drive, its own power supply, and motherboard. We will connect all four nodes with a switch- which we are still working out some details on. This system will then be perfect to install Hadoop upon.

But such a straightforward design, in all honesty, seems kind of boring. So I've been thinking about some possible areas for innovation/improvement, either in the hardware or after we build the supercomputer.

1- I was able to talk with my faculty advisor, Mr. Mac, at the BASIS science fair today. We discussed how supercomputers definitely love massively parallel tasks, but what happens when a task is necessarily serial and CANNOT be easily parallelized? How much human input will be required then to split this task into somewhat parallel chunks (as independent as possible), and how can we reduce it?

This area is called non-linear computing, and there's a lot of research going on in it. One of my ideas was to make the nodes more specialized and have the master node able to tell which area of the code requires a specific task. Like, if you have code with a lot of 3D visualization in one part, then the master node could send that part of the task to the GPU (graphical computing unit). There's a lot of research to be done to reach this point, though, and we'd probably have to build a new programming language to make there be enough differences for the computer to automatically pick up what kind of node a task requires. Also, you'd have to integrate nodes which are slightly different and have them work together flawlessly. Still, I think it's a good idea.

2- Finding some way to get rid of the switch (most expensive part) and have the nodes send messages to each other without it.

3- Another far fetched idea, unless you're using tiny Rasberry Pi "computers", but some way to make the supercomputer really tiny so you can have it next to you on your desk. It would be available to plug in via USB to your laptop, so your laptop can send away calculations that eat up the memory and make it slow down or freeze.

Still thinking- hopefully I find some interesting and feasible ideas for innovation soon! Tell me in the comments-- what would you want a supercomputer to do or have? Nothing is off limits, tell me your wackiest ideas!

PS: Next week is my "spring break."

Over and Out,
Anvita

Saturday, February 21, 2015

Hadoop in a Letter Tray

Hello All,

Excited to report that my search for a case has come to an end-- at least for the present! We have decided to use...a letter tray from IKEA to house the HPCC! The letter tray actually has the perfect dimensions to hold four microATX boards, and one large space at the bottom to hold the power supply and two hard drives. There are some other issues to consider, though, before building.

In supercomputing, the architecture of the system has to be tailored to the tasks you want to accomplish. For example, if you want to run a task with many small pieces that can be completed independently, it's fine to have one master node and many different slave nodes that don't have to be connected to each other. A "master node" is the main computer that processes your input (the task you have asked the computer to complete), divides the task into smaller ones, delegates each smaller task to one of the "slave nodes", and then combines all the output from the slave nodes and presents it neatly to you- the user.

If a task can be divided cleanly into subtasks that don't rely on each other, that task is known as "massively parallel." For example, if you want to add 5 to 10,000 data points and you have 5 computers, then you can give each computer 2,000 data points and tell it to add five to each of them. The computers don't rely on each other's output and can work in parallel- hence, the task is called "massively parallel."

Supercomputers love massively parallel tasks. In such tasks, the slave nodes don't have to communicate with each other and don't have to be connected. Each slave is only connected to the master. This is called a "share nothing" architecture. Slave nodes in such cases do not have to have hard drives or permanent storage- they just compute and send their output immediately back to the master node. Each slave node can be different, although they usually aren't.

Many supercomputers have an alternate architecture today and use Apache Hadoop instead. Hadoop is basically a framework that you install on top of your cluster. Hadoop doesn't adhere to the master/slave architecture (it's more egalitarian.) Hadoop has master nodes and "worker" nodes. The worker nodes are all identical and have hard drives. Worker nodes are often connected to each other and the master node by a switch.

It's kind of a small distinction, but an important one. After all, if we want to run Hadoop, we have to make sure that the hard drives will fit with each computer in our case. Hadoop is much better than relational databases at handling large amounts of unstructured data. Unstructured data basically has no rules attached to it- each data point can have as many numbers or words associated with it as it wants. Hadoop has been shown to be very effective for biological data, so we are leaning towards it right now (http://bioinformatics.oxfordjournals.org/content/early/2013/09/10/bioinformatics.btt528).

I hope that wasn't too much information thrown at you too quickly. If you have any questions, please let me know in the comments and I will answer them as best I can!

Until Next Time,
Anvita

Saturday, February 14, 2015

Case Study

My apologies in advance for the really bad pun, because this week's post is going to be about...cases! Computer cases, specifically.

The supercomputer I build needs to be both compact and robust, able to handle massive amounts of data. We are planning to build our HPCC with eight quad core motherboards, and we are currently considering both micro ATX motherboards and mini ATX motherboards. Mini are a bit smaller (5.9 × 5.9 inches), but they can't use as much memory as the micro ATX and are more expensive. Micro ATX cards are typically 9.6 by 9.6 inches, but can be as small as 6.75 by 6.75 inches.

This divergence into motherboards was basically to illustrate the point that a lot of cases aren't deep enough to fit 2 micro ATX boards side by side. Besides which, we need enough space to add fans for cooling the system, otherwise it will crash! We want to fit four motherboards in a case, and then perhaps stack two cases to get the desired 8 motherboards. So our main constraint is depth.

Also, we'd like to minimize the amount of metal-cutting and welding that we have to do, so we're looking for a case with removable parts and a lot of space to change things around. Not to mention- we also have to fit in hard drives and power supplies. Most cases have a set place for the motherboard, power supply, hard drive, and fans already built in. Convenient for most people, but not for us, since our design will be completely different from that of most computers.

We've visited the Fry's electronics in Gilbert and the one in Phoenix and looked at all the cases they have. Some of them don't even look like computer cases! Online, we're looking at cases from Newegg.

It looks like we're going to have to think of an unconventional housing for our computer system, so we visited IKEA on Thursday. Some people have actually built a computer cluster (that looks very nice) in an IKEA cabinet. Check it out at this link: http://helmer.sfe.se/

Hope to have a case soon!

Anvita

Sunday, January 18, 2015

Introduction: Of Compute Nodes and Things

Hello All,

I'm Anvita Gupta, a senior at BASIS Scottsdale High School, and in the coming months I will be delving into the exciting world of supercomputing. How, you ask? In the broadest terms, I will be building a supercomputer. Specifically, I will be building a High Performance Computing Cluster (HPCC), which is composed of a number of smaller computers that work together in unison. Each computer is called a node.

You might wonder why we even want to build a supercomputer. The answer can lie in something as simple as, say, a spring. We all learn in physics class that F=-kx, but all our laws apply only for ideal springs. To model the movement of springs in the real world, we must take into account several additional variables. For instance- surrounding temperature, heating up of the metal in the spring, wind conditions. For a theoretical spring manufacturer, then, it becomes important to build accurate simulations of these springs. Where will he run these simulations? After all, they will require an enormous amount of computing power. This is where supercomputers come in.

This very rudimentary example (which I might have come up with so that my title would be witty) illustrates one very important concept. The amount of data we have is growing exponentially, and along with it grows our need for infrastructure to handle data processing, and software to find patterns in the data. "Big Data" has become a buzzword not just in the offices of Silicon Valley giants, but even in the vocabulary of us mortals.

My goal is to build a HPCC and optimize it to handle clinical data, for personalized medicine and better disease diagnosis. My goal is efficiency; both of money and of time. I will follow a couple of main stages: design of the computing cluster, actual building of the HPCC, installing/modifying medical software to run on it, and testing my HPCC on medical data.

Within each stage there are a number of factors to consider, such as how to get the nodes to work together, how to keep the HPCC from overheating and catching on fire, (more importantly) how to make the cluster look aesthetically appealing. And in general, how to fit in the greatest amount of computing in the lowest cost and area. I will be interning at AMBA Solutions, a local IT and consulting company.

Looking forward to embarking on this journey!
Anvita