Infrastructure as code with Penguin Computing ScyId ClusterWare™

Penguin Computing Scyld ClusterWare™ is a cluster management software that allows administrators to efficiently manage hardware and software resources and get the most out of their HPC and AI/ML clusters. By integrating Scyld ClusterWare with familiar DevOps tools like Ansible and Git, administrators can automate their work, allowing organizations to maximize their HPC productivity and accelerate innovation.

Watch our webinar to learn more about how Penguin’s Scyld Suite simplifies the management of enterprise HPC/AI.

Scyld ClusterWare uses rapid image-based provisioning to create ready-to-run compute clusters in bare metal, virtualized, or containerized environments. For some clusters, all the possible software needed for a node can be contained within its image. ClusterWare can provision heterogeneous clusters, so multiple images can be created and distributed based on hardware or software attributes. For example, GPU nodes may receive a different image than CPU-only nodes. However, as clusters increase in size, or as more subsets of nodes need to be provisioned differently, changes to images can be hard to track.
As an alternative, administrators can boot into a lightweight image containing Ansible or Puppet and allow that tool to provide additional image configuration. Automating image configuration in this way reduces potential errors and introduces efficiencies and auditability into this otherwise manual task.

Scyld ClusterWare’s fast provisioning and ability to support heterogeneous environments can lead to increased administrative flexibility.
For example, administrators can partition a subset of nodes to test new images while keeping the majority of a cluster in standard production. After successful testing, the new image can be rolled out to the rest of the cluster. In enterprise HPC environments where frequent updates are required, this approach allows production to continue at scale with fewer interruptions while updates are being implemented.

Download Penguin Computing Scyld ClusterWare Data Sheet

STIGs (Security Technical Implementation Guides), are federal security guidelines used to ensure that machines are properly secure in an auditable fashion. One way to automate the deployment of STIGs across a cluster is to provision base compute node images via ClusterWare, then use Ansible to add STIG components on top of these base images. The desired STIG configuration can be saved as an Ansible playbook that runs automatically using Ansible pull at boot time. This integration removes the potential bottleneck that can occur when pushing from an Ansible control node, and allows an administrator to automate the boot process for new nodes. In this example, the administrator ensures that a large number of machines can be secured accurately and efficiently at a granular level, while reducing the amount of manual work required.

As clusters grow in size, so too can administrative teams. At scale, cluster management becomes more complex, and more information may need to be shared by a larger group of people.

Scyld ClusterWare is designed to manage clusters ranging from tens to tens of thousands of nodes, ensuring fast, reliable provisioning and health monitoring for clusters of any size. ClusterWare’s HA, or high availability features allow any head node to provide services to any compute node, vastly reducing, even eliminating interruptions caused by head node failures.

From a DevOps standpoint, ClusterWare head nodes can host Git repositories. Combining this with Ansible pull yields a lightweight configuration that can allow administrators to scale up to thousands of nodes with ease. The portability of playbooks allow them to be reused across multiple environments with different base images, and simplifies version control. This makes it easier to get new environments running quickly, and consistently boot images to a known good state.

Infrastructure as code with ClusterWare allows administrators to maintain a central code base, Gitlab for example, containing all node configurations that are being used. Git repositories, which can be hosted on ClusterWare head nodes, can be used for version control of data and workflows, as well as code. This leads to change management becoming simpler, auditable and easier to track. Additionally, this use of Git repos allows information to be more easily shared internally, as well as with partners, third party contractors, support personnel, etc. as needed.

ClusterWare can run in even the most secure environments, supporting IT-security best practices. Air-gapped deployments are supported, where ISOs can be saved to internal repos, allowing for image creation without requiring a public internet connection. Additionally, ClusterWare supports SELinux, including the RedHat MLS policy, FIPS and STIG federal security protocols, and TPM encryption. Using infrastructure as code can further strengthen an organization’s security posture, ensuring organized version control and CI/CD auditability.

Scyld ClusterWare is developed and supported by a team of experts with decades of HPC experience. We work closely with users to build out our product roadmap and ensure our software will continue to evolve with users’ needs. Using the Scyld ClusterWare’s Ansible-pull features and Git for version control allows administrators to provision and manage clusters using familiar DevOps tools. This helps organizations to operate their HPC environment efficiently and securely.

Sign up for a demo to see how Penguin Computing Scyld ClusterWare can help streamline your enterprise-grade HPC and AI/ML cluster management.

Recent Posts