XSEDE and University of Central Oklahoma (UCO) work together to rebuild UCO’s high performance computing system, Buddy
IU-led XCRI team works through challenges of a pandemic-induced remote build to enable HPC projects spanning particle transport, micro-mixing, stochastic modeling, ecological modeling, and bioinformatics and disease spread modeling.
News and events Research and discovery
Feb 4, 2021
XCRI, the XSEDE Cyberinfrastructure Resource Integration team, recently engaged with the University of Central Oklahoma (UCO) to help upgrade their high performance computing (HPC) system, “Buddy”. UCO reached out to XCRI after learning that they could help with rebuilding their local research computing infrastructure. Buddy is managed by the Center for Research and Education in Interdisciplinary Computation (CREIC) at UCO, led by Dr. Evan Lemley. This system provides computational support for projects spanning particle transport, micro-mixing, stochastic modeling, ecological modeling, and bioinformatics and disease spread modeling.
To assist with the upgrade, XCRI used the latest version of the XSEDE Compatible Basic Cluster (XCBC) software toolkit. XCBC enables campus cyberinfrastructure resource administrators to build a local cluster from scratch, which is then easily interoperable with XSEDE-supported cyberinfrastructure resources.
It was a pleasure working with the XCRI team on our recent rebuild of the Buddy cluster at the University of Central Oklahoma. In addition to the normal challenges of rebuilding a cluster, XCRI also worked remotely.
Evan Lemley, Ph. D., University of Central Oklahoma
Normally, XCRI engineers would travel to UCO to help with this kind of work, but this time the team worked entirely remotely due to travel restrictions. This was the first time the team performed a full-week remote build. The XCRI team, primarily consisting of Eric Coulter, Rick McMullen, and Stephen Bird, all from Indiana University, assisted with several aspects of the update. Eric, Rick, and Stephen, helped plan backups of user data at several sites. In the event that backups failed during the update, they helped build a temporary Globus server at UCO to transfer data to another site at Southwestern Oklahoma State University. XCRI also installed and configured the OpenXDMoD software to provide job monitoring and reporting capabilities.
Samuel Kelting, the XSEDE student champion at UCO, provided most of the hands-on admin work during the rebuild of Buddy, with assistance from Daniel Wagner and Thomas Chen, of UCO, on networking issues. XCRI and UCO staff worked together to install the Open OnDemand (OOD) HPC portal, which was another first for the team. XCRI will be working with Samuel in the near future to integrate Open OnDemand (OOD) as an optional part of the XCBC toolkit.
Buddy has been running at UCO since 2015, when it was initially funded by the National Science Foundation (NSF). The Buddy cluster has 37 nodes, 31 of which are regular compute nodes with 20 cores and 64GB of RAM, 4 of which are high memory with 128 GB of RAM, and 2 of which are Accelerator nodes with both NVIDIA GPUs and Intel Xeon Phis installed. After the XCRI team’s virtual visit, Buddy was up and running with the following base software components: CentOS 7, Slurm, Warewulf, and Imod.
After working with XCRI during the rebuild of Buddy, Dr. Evan Lemley said, “It was a pleasure working with the XCRI team on our recent rebuild of the Buddy cluster at the University of Central Oklahoma. In addition to the normal challenges of rebuilding a cluster, XCRI also worked remotely. The team was fantastic in their problem-solving abilities, making the rebuild seamless. We did not have previous experience at UCO in rebuilding a cluster from scratch, so the XCRI team was critical in making this project a success. The entire XCRI team was around to help and offer advice. Senior XCRI Engineer, Eric Coulter, has a remarkable breadth of technical knowledge and, along with the whole team, is so pleasant to work with.”
For information regarding how XCRI could benefit your campus, visit the related link below.