The ideal candidate will be responsible for maintaining product and industry knowledge. You will work in a team-oriented environment that accelerates operational efficiency.
Responsibilities
- Design and implement GPU optimization strategies to maximize utilization and reduce latency for ML workloads
- Develop and maintain distributed training pipelines using Ray framework for large-scale model development
- Manage and optimize ML infrastructure across multi-cloud environments focusing on cost-efficiency and scalability
- Build monitoring and profiling tools for GPU performance analysis and resource allocation optimization
- Collaborate with data scientists and ML engineers to streamline model training, inference, and deployment processes
- Implement best practices for workload orchestration, fault tolerance, and auto-scaling in cloud environments
- Stay current with GPU architectures, ML frameworks, and cloud technologies to drive continuous infrastructure improvements
Qualifications
- 5+ years of ML infrastructure experience with 3+ years focused on GPU optimization
- Hands-on experience with AWS/EKS for ML workloads in production environments
- Proven expertise with Ray framework (Ray Train, Ray Tune, Ray Serve) for distributed ML computing
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- Strong CUDA programming skills for GPU performance optimization (cuDNN, TensorRT experience preferred)
- Proficiency with deep learning frameworks (TensorFlow, PyTorch, JAX) and performance tuning
- Experience with Kubernetes, Terraform, and infrastructure-as-code practices
- Strong analytical and problem-solving skills for complex performance bottlenecks
- Ability to collaborate effectively with data science, engineering, and DevOps teams
What we offer:
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
NB:
Placement and Staffing Agencies need not apply. We do not work with C2C at this time.
At this moment, we are not able to process H1B transfers. Applicants with CPT and OPT visas are welcome to apply.
About Us:
Grid Dynamics (Nasdaq: GDYN) is a digital-native technology services provider that accelerates growth and bolsters competitive advantage for Fortune 1000 companies. Grid Dynamics provides digital transformation consulting and implementation services in omnichannel customer experience, big data analytics, search, artificial intelligence, cloud migration, and application modernization. Grid Dynamics achieves high speed-to-market, quality, and efficiency by using technology accelerators, an agile delivery culture, and its pool of global engineering talent. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the US, UK, Netherlands, Mexico, and Central and Eastern Europe.
To learn more about Grid Dynamics, please visit www.griddynamics.com. Follow us on Facebook, Twitter, and LinkedIn.