The Linux High-Performance Computing (HPC) Systems Administrator will join a growing team of specialists in the Research Computing Center, providing administration and support for computationally intensive biomedical research at MCW. This position is responsible for installation, configuration, and management of an HPC environment including clusters, large-scale storage, and networking. Specifically, this position will manage HPC hardware/software and provide end-user support, including training and troubleshooting, for researchers conducting HPC workloads. In addition, this position will participate in planning and design of future iterations of RCC’s HPC environment.
• Maintain and administer Linux based HPC clusters with multiple architectures (SMP, MPI, GPU).
• Configure and monitor cluster management software, resource managers, and schedulers.
• Configure and administer network accounts for users and groups.
• Install and test end-user requested software applications on the cluster.
• Document and share solutions for routine system administration tasks and automate when applicable.
• Develop end-user documentation and training materials to support use of HPC.
• Maintain technical relevancy through education, reading, and research.
• Demonstrate initiative through expansion of: technical knowledge, interpersonal skills, business knowledge, and organizational skills.
• Work with team resources to meet department goals, priorities and initiatives.
• Act as a resource to the RCC team in areas of expertise.
• Assure standards are met for all assigned projects and service activities.
• Work closely with other technology teams to meet day to day operational needs.
• Provide feedback for improvements in automation and process for the RCC team.
Appropriate education and/or experience may be substituted on equivalent basis
Minimum Required Education: Bachelor’s Degree
Minimum Required Experience: 5 years
Knowledge – Skills – Abilities
The ideal candidate will have:
• Linux/Unix system administration experience.
• Experience configuring and deploying Linux HPC clusters.
• Experience configuring and maintaining Linux operating systems (CentOS/RedHat 6/7, Ubuntu).
• Experience supporting multiple HPC architectures SMP, MPI, and GPU.
• Experience with container solutions Singularity and Docker.
• Experience managing parallel file systems and performance networking.
• Experience configuring and supporting network authentication with LDAP.
• Experience configuring server security, (SELinux, FirewallD, iptables).
• Familiarity with one or more scripting languages, (Python, Perl, Bash).
• Understanding of well-known job scheduler software, (Torque, Slurm, PBS Pro, SGE).
• Understanding of well-known cluster management software, (Bright Cluster Manager, Rocks, OpenHPC, xCAT, Warewulf).
• Ability to identify hardware/software failures and cluster performance degradation.
• Experience working with researchers in an academic environment.
• Good written and verbal communication, presentation, client service and technical writing skills.
• Good analytical skills and the demonstrated ability to raise concerns and problems in a clear and concise fashion and provide some solutions.
• A goal-based orientation and a focus on results.
• A strong sense of responsibility.
The Medical College of Wisconsin (MCW) is one of the largest healthcare employers in Wisconsin. We are a distinguished leader and innovator in the education and development of the next generation of physicians, scientists, pharmacists and health professionals; we discover and translate new knowledge in the biomedical and health sciences; we provide cutting-edge, collaborative patient care of the highest quality; and we improve the health of the communities we serve.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.