Primary title: Machine Learning Engineer (NLP & LLMs)

About The Opportunity We operate in the Enterprise AI and Natural Language Processing sector, building production-grade language AI and intelligent automation solutions for business workflows and customer-facing applications. Our teams focus on EMR/knowledge retrieval, RAG, and conversational AI that power scaleable, low-latency services. This role is fully remote for candidates based in India.

Role & Responsibilities

Design, implement and optimize NLP and LLM solutions end-to-end — data preprocessing ➜ model fine-tuning ➜ evaluation ➜ inference deployment.
Fine-tune and evaluate transformer-based models (open-weight & closed-weight) using Hugging Face ecosystems and custom training pipelines.
Build robust inference APIs and microservices (FastAPI/Flask) and containerize pipelines using Docker; integrate auto-scaling on cloud infra.
Implement Retrieval-Augmented Generation workflows: vector stores, FAISS/Milvus integration, semantic search, and prompt engineering for high-precision retrieval.
Work with MLOps tooling to productionize models: CI/CD for models, model versioning, monitoring, and inference-cost optimization (quantization, ONNX/TorchScript).
Collaborate with data scientists, backend engineers and product owners to translate ML research into robust features and ship iterative improvements.

Skills & Qualifications Must-Have

4+ years overall experience in machine learning or NLP engineering, with demonstrable production projects.
Strong Python engineering skills and solid experience with deep learning frameworks: PyTorch and/or TensorFlow.
Hands-on experience with Hugging Face Transformers and fine-tuning LLMs for downstream tasks (classification, summarization, QA, generation).
Experience building inference services and APIs (FastAPI/Flask), containerization (Docker), and deploying on cloud (AWS/GCP/Azure).
Practical knowledge of vector search and retrieval systems (FAISS, Milvus) and RAG architectures.
Familiarity with model optimization techniques (quantization, ONNX/TorchScript) and GPU inference workflows (CUDA).

Preferred

Experience with LangChain or similar orchestration frameworks and agentic tool-calling patterns.
Exposure to MLOps tools (MLflow, Weights & Biases), Kubernetes for scaling, and production monitoring/observability.
Background in conversational AI, information retrieval research, or publications in NLP is a plus.

Benefits & Culture Highlights

Fully remote work with flexible hours and focus on output-driven culture.
Opportunity to work on cutting-edge LLM and RAG products and shape production ML practices.
Collaborative, fast-paced engineering environment with emphasis on learning and growth.

To apply, highlight relevant LLM/NLP projects (GitHub, Colab notebooks or model cards), production deployment examples, and clear contributions to model lifecycle or MLOps workflows.

Skills: llm,ml,nlp

Ml +nlp,llm Eng( 4 Yrs)

Job description