High Performance Computing Clusters for Clinical Therapy: February 2015

Saturday, February 28, 2015

Room for Innovation

Hello All,

It's been a pretty busy week for me, but I've managed to finalize the supercomputer design and components. We will be putting the letter trays inside a bigger metal frame (also from IKEA- pictures to come), and each node will essentially be an independent computer of its own, with a laptop hard drive, its own power supply, and motherboard. We will connect all four nodes with a switch- which we are still working out some details on. This system will then be perfect to install Hadoop upon.

But such a straightforward design, in all honesty, seems kind of boring. So I've been thinking about some possible areas for innovation/improvement, either in the hardware or after we build the supercomputer.

1- I was able to talk with my faculty advisor, Mr. Mac, at the BASIS science fair today. We discussed how supercomputers definitely love massively parallel tasks, but what happens when a task is necessarily serial and CANNOT be easily parallelized? How much human input will be required then to split this task into somewhat parallel chunks (as independent as possible), and how can we reduce it?

This area is called non-linear computing, and there's a lot of research going on in it. One of my ideas was to make the nodes more specialized and have the master node able to tell which area of the code requires a specific task. Like, if you have code with a lot of 3D visualization in one part, then the master node could send that part of the task to the GPU (graphical computing unit). There's a lot of research to be done to reach this point, though, and we'd probably have to build a new programming language to make there be enough differences for the computer to automatically pick up what kind of node a task requires. Also, you'd have to integrate nodes which are slightly different and have them work together flawlessly. Still, I think it's a good idea.

2- Finding some way to get rid of the switch (most expensive part) and have the nodes send messages to each other without it.

3- Another far fetched idea, unless you're using tiny Rasberry Pi "computers", but some way to make the supercomputer really tiny so you can have it next to you on your desk. It would be available to plug in via USB to your laptop, so your laptop can send away calculations that eat up the memory and make it slow down or freeze.

Still thinking- hopefully I find some interesting and feasible ideas for innovation soon! Tell me in the comments-- what would you want a supercomputer to do or have? Nothing is off limits, tell me your wackiest ideas!

PS: Next week is my "spring break."

Over and Out,
Anvita

Saturday, February 21, 2015

Hadoop in a Letter Tray

Hello All,

Excited to report that my search for a case has come to an end-- at least for the present! We have decided to use...a letter tray from IKEA to house the HPCC! The letter tray actually has the perfect dimensions to hold four microATX boards, and one large space at the bottom to hold the power supply and two hard drives. There are some other issues to consider, though, before building.

In supercomputing, the architecture of the system has to be tailored to the tasks you want to accomplish. For example, if you want to run a task with many small pieces that can be completed independently, it's fine to have one master node and many different slave nodes that don't have to be connected to each other. A "master node" is the main computer that processes your input (the task you have asked the computer to complete), divides the task into smaller ones, delegates each smaller task to one of the "slave nodes", and then combines all the output from the slave nodes and presents it neatly to you- the user.

If a task can be divided cleanly into subtasks that don't rely on each other, that task is known as "massively parallel." For example, if you want to add 5 to 10,000 data points and you have 5 computers, then you can give each computer 2,000 data points and tell it to add five to each of them. The computers don't rely on each other's output and can work in parallel- hence, the task is called "massively parallel."

Supercomputers love massively parallel tasks. In such tasks, the slave nodes don't have to communicate with each other and don't have to be connected. Each slave is only connected to the master. This is called a "share nothing" architecture. Slave nodes in such cases do not have to have hard drives or permanent storage- they just compute and send their output immediately back to the master node. Each slave node can be different, although they usually aren't.

Many supercomputers have an alternate architecture today and use Apache Hadoop instead. Hadoop is basically a framework that you install on top of your cluster. Hadoop doesn't adhere to the master/slave architecture (it's more egalitarian.) Hadoop has master nodes and "worker" nodes. The worker nodes are all identical and have hard drives. Worker nodes are often connected to each other and the master node by a switch.

It's kind of a small distinction, but an important one. After all, if we want to run Hadoop, we have to make sure that the hard drives will fit with each computer in our case. Hadoop is much better than relational databases at handling large amounts of unstructured data. Unstructured data basically has no rules attached to it- each data point can have as many numbers or words associated with it as it wants. Hadoop has been shown to be very effective for biological data, so we are leaning towards it right now (http://bioinformatics.oxfordjournals.org/content/early/2013/09/10/bioinformatics.btt528).

I hope that wasn't too much information thrown at you too quickly. If you have any questions, please let me know in the comments and I will answer them as best I can!

Until Next Time,
Anvita

Saturday, February 14, 2015

Case Study

My apologies in advance for the really bad pun, because this week's post is going to be about...cases! Computer cases, specifically.

The supercomputer I build needs to be both compact and robust, able to handle massive amounts of data. We are planning to build our HPCC with eight quad core motherboards, and we are currently considering both micro ATX motherboards and mini ATX motherboards. Mini are a bit smaller (5.9 × 5.9 inches), but they can't use as much memory as the micro ATX and are more expensive. Micro ATX cards are typically 9.6 by 9.6 inches, but can be as small as 6.75 by 6.75 inches.

This divergence into motherboards was basically to illustrate the point that a lot of cases aren't deep enough to fit 2 micro ATX boards side by side. Besides which, we need enough space to add fans for cooling the system, otherwise it will crash! We want to fit four motherboards in a case, and then perhaps stack two cases to get the desired 8 motherboards. So our main constraint is depth.

Also, we'd like to minimize the amount of metal-cutting and welding that we have to do, so we're looking for a case with removable parts and a lot of space to change things around. Not to mention- we also have to fit in hard drives and power supplies. Most cases have a set place for the motherboard, power supply, hard drive, and fans already built in. Convenient for most people, but not for us, since our design will be completely different from that of most computers.

We've visited the Fry's electronics in Gilbert and the one in Phoenix and looked at all the cases they have. Some of them don't even look like computer cases! Online, we're looking at cases from Newegg.

It looks like we're going to have to think of an unconventional housing for our computer system, so we visited IKEA on Thursday. Some people have actually built a computer cluster (that looks very nice) in an IKEA cabinet. Check it out at this link: http://helmer.sfe.se/

Hope to have a case soon!

Anvita