The Argonne Leadership Computing Facility’s (ALCF) mission is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community. We help researchers solve some of the world’s largest and most complex problems with our unique combination of supercomputing resources and computational science expertise
Heterogeneity in High Performance Computing (HPC) has never been greater, and most exascale systems deployed in the US will be accelerator-based systems. Understanding the performance of applications running on sizeable fractions of those machines will be a challenge, as the scale and complexity of those applications will be unprecedented. Nonetheless, the hybrid nature of these platforms offers a common opportunity to better understand how the applications interact with the accelerator (which is where most of the computing power will reside). Indeed, the Application Programming Interfaces (APIs) of those accelerators are well defined entry points that can be traced. CUDA or OpenCL are such APIs. By applying techniques derived from Model Centric Debugging on those APIs, it is possible to capture most of the accelerator-related context and events. Those techniques have already been leveraged in an HPC context for debugging purposes or for porting an HPC application from CUDA to OpenCL.
At Argonne, in order to meet those challenges, we’ve been developing a collection of Model Centric Tracing tools that cover the APIs that will be encountered on exascale platforms. In order to meet the scalability and performance requirements, those tracers are based on LTTng, and work on a similar manner to LTTng CLUST but with Model Centric Tracing in mind. The fine level of control in the granularity of the captured traces allows those tracers to be used for a variety of purposes:
- Profiling accelerator usage of HPC applications,
- Debugging accelerator usage,
- Capturing traces that can be reinjected in simulation frameworks,
- Extracting kernels for replay, allowing study and tuning in a sand-box,
- Lightweight and transparent monitoring of platform usage.
Most of those remain to be invented or perfected and offer a lot of opportunities to develop new research. In this context, Argonne’s ALCF is looking for a post-doctoral appointee to perform research and development on the collection of tracers and their uses. Especially, with the exascale Aurora platform expected next year, integration of the tracing framework and its scalability will be an important topic. Another important objective will be to collaborate with application developers to help them leverage the possibilities offered by the tracers. The work will take place in a multi-disciplinary environment and will offer opportunities to interact with a wide range of talents from the whole spectrum of HPC research. The successful candidate will be expected to present and publish their work at major symposia and journals.
Ideal candidates are expected to have:
- Recent PhD in related field
- Comprehensive knowledge in C/C++ programming under Unix/Linux.
- Comprehensive knowledge of one or more libraries and tools such as OpenCL, CUDA/HIP, ROCm, Level0
- Comprehensive knowledge in System Programming
- Considerable knowledge of parallel algorithms, I/O architectures, performance evaluation and tuning.
- Considerable expertise in parallel programming, multicore systems, threading, and scientific application codes.
- Considerable software development skills, written, and communication skills.
- Good collaborative skills, including the ability to work well with other divisions, laboratories, and universities.
- Good self-motivation to get involved and participate in the project team`s research, and balance that against intense code development.
- Candidate should have the ability to create, maintain, and support high-quality software.
- Ability to model Argonne’s Core Values: Impact, Safety, Respect, Integrity, and Teamwork.
- Cover letter (optional); uploaded as a PDF document.
As an equal employment opportunity and affirmative action employer, and in accordance with our core values of impact, safety, respect, integrity and teamwork, Argonne National Laboratory is committed to a diverse and inclusive workplace that fosters collaborative scientific discovery and innovation. In support of this commitment, Argonne encourages minorities, women, veterans and individuals with disabilities to apply for employment. Argonne considers all qualified applicants for employment without regard to age, ancestry, citizenship status, color, disability, gender, gender identity, genetic information, marital status, national origin, pregnancy, race, religion, sexual orientation, veteran status or any other characteristic protected by law.
Argonne employees, and certain guest researchers and contractors, are subject to particular restrictions related to participation in Foreign Government Talent Recruitment Programs, as defined and detailed in United States Department of Energy Order 486.1. You will be asked to disclose any such participation in the application phase for review by Argonne’s Legal Department.