Introduction to using the cluster with tractor

What is the cluster?

The cluster is our high performance computing facility located in ASH 130, owned by the School of Cognitive Science. You can get more information about its current status by visiting fly.hampshire.edu/ganglia/ – its hardware configuration is changing all the time. The cluster is used for many things, and the nodes (as well as the classroom/ndrome Macs) can be passed arbitrary commands by its job distribution system, currently tractor. This document attempts to explain how to get a quick start with tractor. The full, non-site-specific documentation is available here: http://fly.hampshire.edu/docs/Tractor/.

Note that while the classroom and ndrome Macs are not part of the cluster, they are part of what you may hear referred to as the “farm” or “render farm”, as in they can be passed jobs by tractor, so they are discussed in this document for that reason.

The network configuration of the cluster sometimes confuses people. The head node is the only node that is directly accessible from the outside world or elsewhere on campus. Please do not ever run jobs on the head node. On the other hand, it is ok to start tractor jobs on the head node (see below). To get into the head node, you ssh <username>@fly.hampshire.edu. The first time you log in, it will ask you to set a passphrase for your SSH key. I recommend you do not set one – leave it blank. That way you will be able to SSH into the nodes without a password. Once logged into the head node on the command line, should you want to directly log into a compute node, you would do “ssh <nodename>”, for instance, “ssh compute-1-1” and you will notice your prompt change and you are now logged into compute-1-1. For an up-to-date list of nodes, see the same link as previously mentioned above, fly.hampshire.edu/ganglia/

What is tractor?

Tractor is a job distribution system, made by Pixar, that is intended primarily for use as a distribution system for software rendering of 3D animated movies. However, it is designed to be highly configurable and flexible, and many of the mechanisms it has in place are useful for arbitrary applications. It has a dependency-tree XML-style job description format, provisions for building many different requirements into what kind of node gets what jobs and when, what order things must be executed in, things that must be done and/or checked before the job can start, and much more.

The idea behind tractor is that there is a central tractor “engine”, in this case the head node of the cluster, that is the job repository and configuration center. When the blades (the computers that will actually be running jobs) start up, they find the engine via DNS entry, and ask it for the blade definitions, match themselves against them and decide which definitions(s) they fit, and then ask for the list of jobs, and decide if they can take any of them. If they do take any of them, they inform the engine that they have, and the engine takes it off the list of available jobs. The blades check in with the engine frequently, informing it of their progress and status. All of this can be seen at the tractor “dashboard” at fly.hampshire.edu:8000/tractor/tv/. The dashboard has no actual authentication – just enter a username that exists on the cluster and it will let you in. All rendering jobs are run as the user “anim”, so that’s probably the most interesting user to log in as to see what’s going on. Please don’t change anything unless you are authorized to do so. You can, however, see the blade status as any valid user.

What is a job?

So tractor (and many other job distribution systems) has a concept of a “job”, which is one big bunch of executable stuff that is interconnected in some way. You might call it a “run”, or any number of other things, but it’s the packet of stuff that you submit to tractor to tell it that you want it to do something. Each job is broken up into tasks, when can then subsequently be broken up into subtasks. The full documentation about how to script these is here: http://fly.hampshire.edu/docs/Tractor/. However, I will give you a few simple examples, and then show you which of the commands on that page might be most useful to you in our environment, and then how to actually submit a job in our environment. So, here’s a very simple job that will just return the result of running /bin/date on the blade that it gets executed on:

Job -title {high priority task} -serialsubtasks 0 -subtasks {
Task -title {/bin/date on Linux} -cmds {
RemoteCmd {/bin/date} -service {Linux}
}
}

This just says, “We’ve got a job. We’d like to call the job “high priority task”, its subtasks should run in parallel (aka they are not serial subtasks), and it’s only got one task. The title of this task is “/bin/date on Linux”. The command to be executed, which is a remote command (meaning it doesn’t have to be executed locally, most things will be RemoteCmds), is just “/bin/date”, and that has to run on a blade that matches the service “Linux”. Note that there can be multiple RemoteCmd blocks making up the -cmds {} segment of a Task.

One can submit this job by saving it as “job.alf” on fly’s head node, and running “source /etc/sysconfig/pixar” and then “/opt/pixar/tractor-blade-1.6.3/python/bin/python2.6 /opt/pixar/tractor-blade-1.6.3/tractor-spool.py –engine=fly:8000 job.alf”

You can then go to the tractor dashboard, fly.hampshire.edu:8000/dashboard/, log in as the user you submitted the job as, and see the status if your job. You’ll see the dependency tree laid out visually for you in the right pane of the “jobs” page. You can double-click the one task to open up a new window with its output, which should just be the date on one of the linux nodes. You can click on “blades” to see the status of each blade.

You can, of course, build much more complicated jobs. Mostly, folks who use tractor have their scripts generated for them by some controlling process. When we render 3D animation with tractor we do just that. Here is a snippet from an autogenerated script:

##AlfredToDo 3.0
Job -title {sunrise RENDER c1_08 full 1-239 eyeTracking} -envkey {rms-3.0.1-maya-2009} -serialsubtasks 1 -subtasks {
Task {Job Preflight} -cmds {
RemoteCmd { /helga/sunrise/tmp/c1/c1_08/render.111003.124955/jobpreflight.sh } -service {slow}
}
Task {Layers Preflight} -serialsubtasks 0 -subtasks {
Task {Layer eyeTracking} -cmds {
RemoteCmd { /helga/sunrise/tmp/c1/c1_08/render.111003.124955/eyeTracking.sh } -service {slow}
}
}
Task {Render all layers 1-239} -serialsubtasks 0 -subtasks {
Task {Render layer eyeTracking 1-239} -subtasks {
Task {Layer eyeTracking frames 1-1} -cmds {
RemoteCmd { /helga/sunrise/tmp/c1/c1_08/render.111003.124955/launchRender -batch -command {helgaBatchRenderProcedure("eyeTracking",1,1,1,"main")} -f
ile /helga/sunrise/tmp/c1/c1_08/render.111003.124955/eyeTracking.ma -proj /helga/sunrise/wc } -service {slow}
}
Task {Layer eyeTracking frames 2-2} -cmds {
RemoteCmd { /helga/sunrise/tmp/c1/c1_08/render.111003.124955/launchRender -batch -command {helgaBatchRenderProcedure("eyeTracking",2,2,1,"main")} -f
ile /helga/sunrise/tmp/c1/c1_08/render.111003.124955/eyeTracking.ma -proj /helga/sunrise/wc } -service {slow}
}
...

Note that this script features nested tasks, cases where serialsubtasks is set to both 0 and 1, and a specific service tag “slow”.

Service tags

You’ll notice that there is a “-service {Linux}” in the first example job above. This is the category of blade that the command in question can be executed on. It is not case-sensitive. We currently have the following services defined, in addition to the standard Pixar services (“PixarRender”, “PixarNRM”, “RfMRender”, “RfMRibGen”, and “PixarMTOR”, which all of our blades currently provide):

fast – these are blades that should be free most of the time for executing quick runs. Please do not use this service tag on a job that will take a long time. It is intended to be reserved for jobs with quick turnaround. We are not using this system right now and the fast/slow tags are currently broken and need rethinking.
slow – these are blades that are designated OK to submit long grinding overnight or longer jobs to. Use this tag if you don’t need your job to come back quickly and it might take a long time. Again, this idea/system is currently broken.
linux – these are blades that are running linux. Use this service tag if you know your job will only run on linux.
cluster – these are blades that are nodes in the cluster. If you only want your job to run on cluster nodes, use this tag.
rack1 – these are blades in Rack 1 of the cluster, which is mostly whitebox desktop hardware. At this time, they are 4 to 8-core nodes with at least 2GB of RAM per core.
rack2 – these are Dell PowerEdge 1855 blade servers with 4GB of RAM each. They are very old and slow, the oldest and slowest machines in our render farm, but they do well on certain tasks that aren’t easily parallelizable, because they have high clock speeds even though they don’t have a lot of cores.
rack4 – these are blades in Rack 4 of the cluster, which is full of what I call “cluster-in-a-box” nodes, meaning tons of processor cores and lots of RAM per node. At the time of this writing, we have 3 32GB 48-core nodes, one 24GB 8-core 16-execution-unit node, and one 64GB 64-core node in this rack. This tag is mostly for Lee, but others doing GP runs or other similar things using clojure may also find it useful. Please check in with Lee before you use this service tag.
macosx – these are blades that are running OSX. At the time of this writing, this means the classroom and the ndrome, with the exception of the one linux box in the ndrome.
shake – these are blades that can run shake. At the time of this writing, that means all the OSX machines, but we separated them in case that was ever not the case.
blender – these are the machines that can run blender
tom – this is a special service tag for Tom, that selects rack1 and rack4 except compute-4-5

Please note: if you are reading this document and you haven’t already had a conversation about using the cluster with Lee, Chris, and/or Josiah, you should do that. Also, you’ll need a shell account, which you can get from Josiah.

How do I share data?

A common question when wanting to submit jobs to tractor is “How do I give my tasks concurrent access to the same data?” There are several answers to this question. I will discuss five options here.

The cluster has cross-mounted home directories. The home directories are stored on the head node and shared out to all the compute nodes. This will take care of most people and you can stop right here.
There is a RabbitMQ server running on the head node, for use in message passing between running processes. Feel free to use it.
There is a large fileserver, keg, that has a share, /helga, mounted on every machine in the farm, Mac and cluster node alike. This is intended primarily for rendering purposes, and is one of the foundations of our considerable rendering infrastructure, but may possibly be able to be used in other ways if necessary. If there was compelling enough reason, we could also mirror this configuration in another way.
MySQL. There is a mysql server on the head node, and I (Josiah) would be happy to give you an account.
Roll your own. Build into your program that you want to run on the cluster some message-passing or data-sharing method that works for you.

Where can I store large amounts of local data temporarily?

Each node’s extra space is at /state/partition1. Please clean up after yourself.

Please feel free to contact Josiah with further questions at wjerikson@hampshire.edu or x6091.

Computational Intelligence Laboratory

Hampshire College