Production-ready ML Apps
Below is a list of my production-ready deployed apps. You can also check out a complete list of my other machine learning projects here.
LLM-RAG-powered-QA-App [Code]
- Fine-tuned a 20B parameters Large Language Model (LLM) in a multi-GPU cluster environment by leveraging the distributed training paradigm.
- Developed a production-ready, scalable Retrieval Augmented Generation (RAG)-based context-aware Question Answering (QA) App that first finds contexts relevant to the incoming query by implementing fast vector similarity search within the pre-defined embedding space and then sends these contexts alongside the query to the fine-tuned LLM model to generate the answer.
- Implemented scalable major ML workloads for contexts (load, embed, and index the contexts in the vector database) across multiple workers with different compute resources and served the LLM App in a highly robust and scalable manner.
- Libraries/Framework used: PyTorch, Transformers, Ray, LangChain, and FastAPI
- Below video shows the deployed LLM app in action: