About The Company
Founded in 2004 in sunny San Diego, California, ServiceNow has grown into a global leader in cloud computing and enterprise software solutions. Driven by a visionary approach to transforming work processes, the company has established itself as a pioneer in AI-enhanced technology, serving over 8,100 customers worldwide, including 85% of the Fortune 500®. ServiceNow's intelligent cloud-based platform seamlessly connects people, systems, and processes, empowering organizations to operate more efficiently, innovatively, and securely. With a commitment to making the world work better for everyone, ServiceNow continues to innovate and expand its offerings, leveraging advanced AI and machine learning capabilities to shape the future of work.
About The Role
We are seeking a highly skilled and motivated Staff Machine Learning Engineer to join our Platform Engineering and AI Technology Organization (PLATO) at ServiceNow. This role is pivotal in building and maintaining our AI infrastructure, deploying scalable AI workloads, and ensuring high performance and reliability of our GPU clusters. The successful candidate will collaborate closely with researchers, AI engineers, and infrastructure teams to develop robust, efficient, and innovative AI platforms that enable end-to-end AI-powered work experiences for our customers. This position requires being onsite in our Santa Clara office for two days per week, offering a dynamic environment focused on cutting-edge AI and platform engineering. As part of our team, you will contribute to the continuous improvement of our operational practices, develop reusable code, and mentor colleagues to foster a culture of knowledge sharing and technical excellence.
Qualifications
- 4+ years of development experience with Python, GoLang, Java, or similar programming languages
- 4+ years of experience operating highly available distributed workloads on Kubernetes following a DevOps approach
- Proficiency in leveraging or critically analyzing AI integration into workflows, decision-making, or problem-solving
- Experience with prompt engineering and developing features based on large language models (LLMs)
- Hands-on experience with training and fine-tuning large language models, including distillation, supervised fine-tuning, and policy optimization
- Experience operating LLMs on NVIDIA GPUs
- Strong experience with DevOps tooling such as Helm, Ansible, Kubernetes, Prometheus, Splunk, and GitLab CI
- Proficiency in operating distributed systems built on Linux and J2EE
- Knowledge of software-defined networking, infrastructure as code, and configuration management
- Experience developing secure and compliant software for regulated environments
- Ability to manage projects with significant technical risks and drive outcomes effectively
- Preferred: 4+ years of experience in infrastructure and platform operations, deployments, SRE, and continuous platform health improvement
Responsibilities
- Design, develop, and implement infrastructure and platform features that support AI workloads, ensuring scalability and performance
- Collaborate with cross-functional teams including researchers, AI engineers, and infrastructure specialists to optimize GPU cluster performance and reliability
- Enhance operational practices by translating operational use cases into software tooling requirements for Site Reliability Engineering (SRE)
- Support deployment activities and provide ongoing support for AI/ML developers to facilitate smooth product delivery
- Develop high-quality, clean, scalable, and reusable code adhering to best practices such as code reviews and unit testing
- Engage with product owners to understand detailed requirements, owning the full development lifecycle from design through testing and deployment
- Operate and optimize large language models on NVIDIA GPUs, ensuring efficient performance
- Mentor colleagues, promote knowledge sharing, and foster a culture of continuous learning and innovation
Benefits
- Competitive base salary ranging from $173,100 to $303,000, depending on experience and location
- Equity options and variable/incentive compensation programs
- Comprehensive health plans including medical, dental, and vision coverage
- Flexible spending accounts and a 401(k) plan with company match
- Employee Stock Purchase Program (ESPP) and matching donations
- Flexible time-off policies and family leave programs to support work-life balance
- Opportunities for professional development and career growth within a global organization
Equal Opportunity
ServiceNow is an equal opportunity employer. We are committed to creating an inclusive environment where all qualified applicants receive consideration for employment regardless of race, color, creed, religion, sex, sexual orientation, gender identity or expression, national origin, age, disability, veteran status, or any other protected category. We also consider qualified applicants with arrest or conviction records in accordance with applicable laws.