Experience: 5-10
Role Brief:
MLOps Engineer
We are seeking a skilled and experienced
MLOps Engineer
to join our team and drive the operationalization of machine learning (ML) and large language model (LLM) pipelines at scale. The ideal candidate will be responsible for automating, deploying, monitoring, and maintaining AI/ML solutions—transforming prototypes into robust, customer-ready systems while mitigating risks such as production pipeline failures.
This role requires strong expertise in cloud infrastructure, CI/CD pipelines, model orchestration, and close collaboration with cross-functional teams to ensure seamless deployment across diverse customer environments.
Key Responsibilities
- Design and implement
scalable infrastructure
for ML/LLM pipelines using AWS services such as
AWS Batch, Fargate, Bedrock
, and related tools
- Manage
auto-scaling mechanisms
to handle fluctuating workloads and ensure high availability of REST APIs
- Automate
CI/CD pipelines and AWS Lambda functions
for model testing, deployment, and updates to reduce manual errors and improve efficiency
- Build and manage
end-to-end ML workflows
using
Amazon SageMaker Pipelines
and optimize workflows using
AWS Step Functions
- Perform
drift analysis
(data drift, concept drift, and label drift) and implement mitigation strategies such as:
- Automated alerts
- Model retraining triggers
- Performance audits
- Set up
reproducible workflows
for data preparation, model training, and deployment
- Provision and optimize cloud resources (GPUs, memory, compute) to support
large-scale models
, including
RAG-based systems
- Automate
model retraining workflows
to ensure models stay updated as data evolves
- Collaborate closely with
Data Scientists, ML Engineers, and DevOps teams
to integrate models into production environments
- Implement
monitoring and model observability frameworks
to track model performance and detect degradation or drift in real time
- Build
monitoring dashboards and real-time alerting
for pipeline failures and performance issues
Required Skills & Qualifications
- Education:
BE / BTech / ME / MTech (Any Engineering discipline)
- Experience:
Minimum
4+ years
of hands-on experience with AWS services, including:
- AWS Lambda
- Amazon Bedrock
- AWS Batch with Fargate
- Amazon RDS (PostgreSQL)
- DynamoDB
- SQS
- CloudWatch
- API Gateway
- Amazon SageMaker
- Strong hands-on experience in
drift detection and mitigation
for production ML systems
- Working knowledge of ML frameworks such as
PyTorch
and
TensorFlow
to understand model deployment requirements
- Experience building and deploying
REST APIs
using
FastAPI
or
Flask
- Familiarity with
model observability and monitoring tools
, such as:
- Evidently
- NannyML
- Phoenix
- Grafana
- Experience with
retraining and orchestration tools
like
MLflow, Kubeflow, or Airflow
Good to Have
- AWS Certified Machine Learning – Specialty
certification