XCRI, the XSEDE Cyberinfrastructure Resource Integration team, recently engaged with Langston University to upgrade their HPC (High Performance Computing) system, “Lucille”.
Lucille is used to teach students at Langston the fundamentals of high performance parallel computing and high throughput distributed computing. Langston University is the only historically black college or university (HBCU) in the state of Oklahoma and approximately 70% of Langston’s student body are first-generation college students. Lucille also contributes to local interdisciplinary research in bioinformatics, applied mathematics, and high-energy physics.
Over the summer, Dr. Franklin Fondjo, director of the Langston University Computing Center for Research and Education (LU-CCRE), reached out to the XCRI team asking for support with the deployment and automation of an xCat (Extreme Cloud Administration Toolkit) based cluster. xCAT is open-source distributed computing management software developed by IBM and used for the deployment and administration of Linux or AIX based clusters.
Dr. Fondjo and the XCRI team worked together to extend the XCBC (XSEDE Compatible Basic Cluster) toolkit to work with the xCAT cluster management software, which was not previously an option included in the toolkit. This remote visit proved to be extremely helpful to both Langston and XCRI, resulting in great extensions to the XCBC and a highly flexible new compute environment for the Langston University Computing Center for Research and Education.
XCRI is a valuable service for a small institution running a local cluster. This great service can be used by small HPC groups to support struggling centers with limited manpower.
Dr. Franklin Fondjo Fotou
“XCRI is a valuable service for a small institution running a local cluster. This great service can be used by small HPC groups to support struggling centers with limited manpower. I would especially like to thank Eric Coulter for his patience and his valuable support.” said Dr. Fondjo.
Lucille is part of the national and global research infrastructure utilizing high speed networks and Open Science Grid middleware. This system has 28 compute nodes with 32 CPUs and 128 GB of RAM each. It also has 4 GPU nodes with NVIDIA K20m GPUs and 192 GB of RAM. All of the nodes are connected by 10Gb ethernet. The cluster has a theoretical Rmax of 27 double-precision TFlops.
The new version of Lucille allows for stateful management of compute nodes, automated with Ansible and using the OpenHPC project as a base. Dr. Fondjo also requested assistance installing JupyterHub and RStudio Server with the cluster. The integration of these interactive environments with the SLURM scheduler enables Lucille to provide a wide range of interactive computing beyond the standard batch computing model. In addition, the XCRI team will fold these into the XCBC toolkit as optional enhancements for future site visits.