Primary title:
Machine Learning Engineer (NLP & LLMs)
About The Opportunity
We operate in the Enterprise AI and Natural Language Processing sector, building production-grade language AI and intelligent automation solutions for business workflows and customer-facing applications. Our teams focus on EMR/knowledge retrieval, RAG, and conversational AI that power scaleable, low-latency services. This role is fully remote for candidates based in India.
Role & Responsibilities
- Design, implement and optimize NLP and LLM solutions end-to-end — data preprocessing ➜ model fine-tuning ➜ evaluation ➜ inference deployment.
- Fine-tune and evaluate transformer-based models (open-weight & closed-weight) using Hugging Face ecosystems and custom training pipelines.
- Build robust inference APIs and microservices (FastAPI/Flask) and containerize pipelines using Docker; integrate auto-scaling on cloud infra.
- Implement Retrieval-Augmented Generation workflows: vector stores, FAISS/Milvus integration, semantic search, and prompt engineering for high-precision retrieval.
- Work with MLOps tooling to productionize models: CI/CD for models, model versioning, monitoring, and inference-cost optimization (quantization, ONNX/TorchScript).
- Collaborate with data scientists, backend engineers and product owners to translate ML research into robust features and ship iterative improvements.
Skills & Qualifications
Must-Have
- 4+ years overall experience in machine learning or NLP engineering, with demonstrable production projects.
- Strong Python engineering skills and solid experience with deep learning frameworks: PyTorch and/or TensorFlow.
- Hands-on experience with Hugging Face Transformers and fine-tuning LLMs for downstream tasks (classification, summarization, QA, generation).
- Experience building inference services and APIs (FastAPI/Flask), containerization (Docker), and deploying on cloud (AWS/GCP/Azure).
- Practical knowledge of vector search and retrieval systems (FAISS, Milvus) and RAG architectures.
- Familiarity with model optimization techniques (quantization, ONNX/TorchScript) and GPU inference workflows (CUDA).
Preferred
- Experience with LangChain or similar orchestration frameworks and agentic tool-calling patterns.
- Exposure to MLOps tools (MLflow, Weights & Biases), Kubernetes for scaling, and production monitoring/observability.
- Background in conversational AI, information retrieval research, or publications in NLP is a plus.
Benefits & Culture Highlights
- Fully remote work with flexible hours and focus on output-driven culture.
- Opportunity to work on cutting-edge LLM and RAG products and shape production ML practices.
- Collaborative, fast-paced engineering environment with emphasis on learning and growth.
To apply, highlight relevant LLM/NLP projects (GitHub, Colab notebooks or model cards), production deployment examples, and clear contributions to model lifecycle or MLOps workflows.
Skills: llm,ml,nlp