Setting up Starfish at TACC for the astro hack day

The UT Austin Astronomy Department graduate students and postdocs held their first hack day on Friday January 16, 2015. A blog post about it is here. This post goes into some more detail on my own hack which was aimed at deriving physical properties of stars from their near infrared spectra.

Andy Mann, and Kevin Gullikson, and I worked on implementing robust inference of IGRINS infrared stellar spectra with the new Starfish code. The first challenge was package management. Migrating to Python 3.3 was made easy with anaconda, but none of us had Julia installed on our laptops, and our HDF5 inexperience was a barrier. We spent a while navigating the Starfish dependencies and examples, and patching hard-code. The main challenge was setting up the stellar model grids with HDF5.

Our effort culminated in a Skype call with the author of the code, Ian Czekala. Ian was appreciative that we were testing out his code, and we were jazzed to get insight straight from the source. We talked about his paper, the code, and future improvements. We realized many of the existing issues would prevent us from running the code by the end of the hack day, but we started a useful dialog and potential collaboration. One major accomplishment of the Skype call was that we realized we would definitely need access to a high performance compute cluster to run the code on the dozens of IGRINS echelle orders. I had already got IGRINS PI Dan Jaffe to request a start up allowance of 50,000 SU’s on Wrangler from the Texas Advanced Computing Center (TACC). The request was approved in mere hours, and we started to configure the dependencies and navigate the Wrangler tech support. For instance, Python 3.3 is not currently supported on Wrangler, but the anaconda Python distribution comes with MKL optimizations. Was this satisfactory to leverage the novel capabilities of the DSSDs? Is this computing project IO bound? Should we simply use an alternative HPC system instead? A few days after the hack day I wrote a memo to TACC to explain the project needs.

Overall one main takeaway from the hack day was how easy it was to get TACC approval for a Startup Allocation. At the end of the hack day I showed the other participants how to do it, and several did! This is a good thing for two reasons. One we want to train our students in the cutting edge of distributed data analytics, which will be ubiquitous in our field in the next decade, and is already ubiquitous in industry. Second, the low barrier to entry makes it possible to be more experimental with TACC, especially early on when you don’t know what you’re doing. It might seem counterintuitive to let grad students loose on a super computer. But the benefit is that getting on the supercomputer early makes it easier to iterate on development and deployment both locally (laptop) and remotely (TACC). This strategy can alleviate dependency hell, and establish communication with TACC early in the project cycle. Aaron Smith (UTexas Astronomy) reached out to the astro data group on the prospect of giving a GSPS talk on his experience with exactly this strategy for HPC at TACC- I urged him to.