Specialist Engineer, High Performance Computing

Tustin, California, United States

$132-202k

Full Time

2 months ago

Job description

Virgin Galactic is looking for a Specialist Engineer in High-Performance Computing. This is a high visibility position within engineering that will perform the role of system administrator of current High-Performance Computing (HPC) systems while also playing a key role in defining the path forward for HPC at Virgin Galactic.  You will work with internal users across several functional areas to maximize the performance of the current systems and remove roadblocks to access and utilization. You will provide strategic insight into the path forward for HPC including data integrity planning and digital thread integration. The role will also require you to work with external vendors both to maintain the existing infrastructure and to lead expansion and upgrade work. The right candidate will strike a balance between technically advanced approaches and bootstrapped innovation that will allow for cost effectiveness. If you’re an experienced admin who is also excited about leading the charge to define the future state of HPC for the Spaceline for Earth, we want to talk to you.

 

Responsibilities 

This role is both systems-facing and user-facing. In it, you will use your in-depth knowledge of Linux, your cluster administration experience, and your passion for supporting ground-breaking engineering work on a daily basis. You will play a crucial role in designing, implementing, and maintaining our advanced computing infrastructure. 

  • HPC Infrastructure Maintenance: Manage the day-to-day system administration of Linux-based cluster computing and storage environments, and associated network infrastructure, in alignment with applicable company, regulatory agency, and/or contractual security and privacy requirements. 
  • Software: Make sure that users have the environment, tools, compilers, and any additional resources needed to deploy applications across the clusters, including open source, proprietary and in-house developed codes. 
  • Slurm: Responsible for all aspects of management of Slurm for efficient resource allocation and job scheduling across the clusters.  This includes managing job accounting databases and generating utilization reports. 
  • User Support: Collaborate with colleagues and team members to understand their computing needs, provide technical assistance, and troubleshoot issues related to system performance and job execution. Provide user consultation and training. 
  • Performance monitoring: Monitor system performance, diagnose bottlenecks, and take necessary actions to improve system performance. 
  • Documentation: Maintain detailed documentation of system configurations, procedures, and troubleshooting guides to facilitate knowledge sharing and team collaboration. Develop user facing documentation. 
  • Planning: Meet regularly with internal and external stakeholders to understand existing challenges, anticipated needs, and opportunities for closer collaboration. Develop roadmap for system improvements and life cycling, making recommendations to leadership. Creation of data integrity plans as well as strategy for data integration into the digital thread.

Preferred Skills and Experience

  • Relevant bachelor’s degree and ten years of increasingly technical work experience or a combination of education and relevant experience.
  • In-depth experience managing multiuser HPC clusters and distributed storage environments.
  • Working knowledge of engineering simulation tools such as CFD, FEM and heat transfer codes that typically run on clusters.
  • Independent and proactive working style.
  • Ability to communicate with a diverse set of stakeholders. 
  • Excellent problem-solving skills.
  • Quick learner eager to take on new challenges.

This position requires in-depth knowledge of and hands-on experience with: 

  • Linux cluster system administration (RedHat/CentOS/Rocky) 
  • SLURM configuration and management 
  • Active Directory authentication for Linux systems 
  • SMB file shares between Windows and Linux systems 
  • BeeGFS configuration and management 
  • Scripting for system management and task automation 
  • Networking technologies (Infiniband, Message Passing Interfaces) 
  • Installing and repairing servers and associated cluster hardware 
  • Technical complex problem solving and troubleshooting 
  • Experience with stateless node management and provisioning (OpenHPC/Warewulf)
  • Experience with the proprietary ACT ClusterVisor tools 
  • Experience with hybrid on-prem/cloud cluster technologies and containerization in the context of HPC
  • Tape backup systems
  • Working knowledge of Digital Thread concepts
  • Working knowledge of 3DEXPERIENCE platform

 

Physical and/or Additional Requirements

  • This position is hybrid with required in-person hours M-Th each week in our office in Tustin, California. Occasional travel to other VG facilities or data centers is expected on an as-needed basis.

 

 

 

The annual U.S. base salary range for this full-time position is $132,100.00–$201,550.00. The base pay actually offered will vary depending on job-related knowledge, skills, location, and experience and take into account internal equity. Other forms of pay (e.g., bonus or long term incentive) may be provided as part of the compensation package, in addition to a full range of medical, financial, and other benefits, dependent on the position offered. For more information regarding Virgin Galactic benefits, please visit https://vgcareers.virgingalactic.com/global/en/benefits

 

Who We Are

Virgin Galactic is transforming humanity’s relationship with space. By making it more open and accessible, we are connecting the world to the love, wonder and awe inspired by space travel, helping to create new opportunities for the benefit of life on Earth. Whether it’s supporting cutting-edge research missions for scientists and students, or offering life-changing experiences for the adventurers among us, Virgin Galactic is THE spaceline for Earth. Such an audacious vision requires a team as driven as they are curious - one capable of redefining the boundaries of what’s possible. 

 

Export Requirements 
To conform to U.S. Government export regulations, applicant must be a U.S. Person (either a U.S. citizen, a lawful permanent resident or a protected individual as defined 8 U.S.C. 1324b(a)(3) or be able to obtain the required authorization from either the U.S. Department of State or the U.S. Department of Commerce. The applicant must also not be included in the list of Specifically Designated Nationals and Blocked Persons maintained by the Office of Foreign Assets Control. See list here.

 

EEO Statement
Virgin Galactic is an Equal Opportunity Employer; employment with Virgin Galactic is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, gender identity, national origin/ethnicity, veteran status, disability status, age, sexual orientation, marital status, mental or physical disability or any other legally protected status. 

 

DRUG FREE WORKPLACE
Virgin Galactic is committed to a Drug Free Workplace.  All applicants post offer and active teammates are subject to testing for marijuana, cocaine, opioids, amphetamines, PCP, and alcohol when criteria is met as outlined in our policies. This can include pre-employment, random, reasonable suspicion, and accident related drug and alcohol testing. 

 

PHOENIX EMPLOYMENT REQUIREMENTS
For individuals seeking employment at our Phoenix Mesa Gateway Airport facility, employment is contingent upon you obtaining and maintaining a TSA authorized security badge.  This includes initial and annual mandatory background checks that are governed by TSA, and conducted by the Phoenix Mesa Gateway Airport badging office.

 

Related Jobs

Structures Manufacturing Engineer I, Mechanical Assembly

📍 Long Beach, California, United States

💰 $95-119k

🕒 Full Time

📌 15 hours ago

Software Systems Hardware Integration Engineer Intern (Spring 2025)

📍 San Fransisco, California, United States

💰 $60-60k

🕒 Internship

📌 15 hours ago

Industrial Engineer 4

📍 United States-California-Northridge, United States

💰 $13-181k

🕒 Full Time

📌 15 hours ago

Mid Career System Integration / Test Engineer

📍 Sunnyvale, California, United States

💰 $82-178k

🕒 Full Time

📌 15 hours ago

Apply now