CEKA/EESI high performance computing resource
July 2007
The Center for Environmental Kinetics Analysis (CEKA) within the Earth and Environmental Systems Institute (EESI) has had its high performance Lion-XO cluster online for approximately 18 months now. It continues to be a highly successful venture, with 35 active users and full utilization of its capacity.This cluster is maintained and operated by the High Performance Computing Group (HPC/GEaRS/ASET/ITS) at Penn State University, and CEKA/EESI owns 32 quad-processor nodes on this cluster. Although primarily intended for CEKA research, this cluster is available for use by any EESI associate, with the following proviso. In order to participate, the EESI associate must provide a contribution of $5000 per year in salary for EESI support personnel (J. Miley, D. Pollard and a computational chemistry support person), to provide scientific computational support for EESI users of the cluster.
This scientific support will involve general advice on software, Fortran, numerical methods, and help with a limited set of particular models (for chemistry: GAUSSIAN, VASP; for climate: GENESIS, MOM2, RegCM3, in-house ice-sheet and vegetation models).The standard HPC/GEaRS method of computer-time priority and allocation applies to all participating EESI users, as follows.
Each group is assigned a fair-share target equal to the percentage of the cluster that they own. For CEKA/EESI, this is currently 44.4% (owning 32*4 = 128 out of a total 288 CPUs in Lion-XO). Group usage is tracked by HPC over a sliding window that is typically 6 weeks.
As long as the group's usage as a whole is less than their target, their jobs continue to run at highest priority. If at any time the group's sliding-window usage rises above their target, their jobs will run at low priority (treated as non-partner users) until their usage drops below the target again. Besides the group-based fair-share above, there is a user-based fair share between members of the same group. This is weighted many orders of magnitude lower than the group-based fair-share, so that it is negligible between groups but still works within them. Although the CEKA-group fair-share allocation is shared between all EESI users, it is anticipated that over-use will not occur, especially in the next few years, and substantial fractions of it will be available to individual heavy users. Note that participation with a $5000 EESI-support contribution provides much more high-priority CPU-time than an individual purchase of one or two nodes on a GEaRS cluster: for instance, the purchase of one 4-quad node would cost ~$5000 or more, but would only provide fair-share usage of 1.4% of Lion-XO, compared to a substantial fraction of CEKA/EESI's 44.4%.The recently reconstituted Earth System Science Center headed by Michael Mann has purchased a similar HPC resource that has been subsidized by EESI. This cluster will be managed very similarly to Lion-X0. This system has just come online and preliminary testing is about to be finished. The system is named Lion-XC and is an Intel based cluster currently comprised of 96 nodes. Each node has two dual core 3.0 GHz Woodcrest CPU's. By mid-summer this cluster will be fully populated with 128 dual processor nodes. The ESSC's target priority usage on this system is 25% of it's overall future capacity.
Test results indicate a potential for Lion-XC to run jobs up to 30% faster than Lion-XO. There is capacity for 5 researchers to purchase time on this system.If you are interested in using either of these systems please contact John Miley by e-mail at jmiley@eesi.psu.edu.