Selected Projects

Below is a non-exhaustive list of my non-research projects. You can also check out a complete list of my projects here.

Designed an advanced story generation system that leverages agentic multimodal GenAI to generate engaging & meaningful stories from user-uploaded images. It seamlessly integrates retrieval-based reasoning with generative AI using Large Vision-Language Models and vector search to craft immersive narratives.
The system supports multiple data modalities (image & text), RAG-based retrieval for coherence, agentic AI-driven decision-making, the InternVL2-40B model, and the audio narration (Text-to-Speech) capability for engaging & immersive story generation.
Libraries/Framework: Streamlit, vLLM Kernels, ChromaDB, LangChain, Cloudinary API, pyttsx3 (Text-to-Speech), and LangGraph

An advanced AI chatbot that enables users to upload images and ask questions (primarily Navigation-oriented) via text or audio, receiving real-time responses in both formats.
Key features include:
- image upload and analysis
- speech-to-text conversion using the Google SpeechRecognition API
- integration of visual, text, and audio data for comprehensive interactions
- maintenance of conversation context across multiple turns
- real-time responses powered by Vision-Language multimodal models
Libraries/Framework: Streamlit, vLLM Kernels, Google SpeechRecognition API, RunPod, pyttsx3 (Text-to-Speech), and OpenAI API

Defined a custom contrastive loss and trained a few-shot version of Siamese Networks to do n-way k-shot image classification by mapping the image similarity task into a fully-supervised classification learning task.
Libraries/Framework: Numpy, Matplotlib, PyTorch, and TorchVision

Implemented Graph Convolutional Networks-based Variational Graph AutoEncoders to generate new molecular graphs that possess similar statistical distribution as that of the learned distribution of molecular graphs (used to train the model).
Libraries/Framework used: PyTorch, PyTorch Geometric, Numpy, and NetworkX

Developed a Human Activity Recognition system that utilizes a pre-trained 3D convolutional ResNet-34 model to identify activities in videos on a per-frame basis.
Trained on the Kinetics dataset, which includes 400 human activity classes and approximately 300,000 videos.
The framework can automatically classify video datasets, monitor compliance in food service environments, and oversee patron behavior in bars and restaurants.
Libraries/Framework: Numpy and OpenCV

Implemented a Stage-wise StackGAN model capable of producing photo-realistic images conditioned on text descriptions. It is also able to contain necessary details and vivid object parts while generating high-quality images.
Given the text description, the Stage-1 GAN forms the primitive shape and colors of the object. It puts less emphasis on the quality of the image being formed, thereby yielding a low-resolution image.
The Stage-2 GAN takes Stage-1 results and text descriptions as inputs and generates high-resolution images with photo-realistic details and thus can rectify defects in Stage-1 results and add compelling details with the refinement
process.
Libraries/Framework: Keras, Tensorflow, Numpy, Pandas, and Matplotlib

Trained a simple contrastive learning-based framework to perform text similarity, where sentences with similar semantic features attain higher similarity scores.
Used a pre-trained BERT model to generate two different, yet semantically similar representations for each input sentence with minimal variation.
To compute the degree of similarity between these latent representations, employed a cosine
similarity-based contrastive metric.
Libraries/Framework: Scikit-learn, Tensorflow, Numpy, Pandas, and Transformers

Implemented a zero-shot question-answering system that, for each question q with available answer options a, b, and c, computes each option’s score as the negative log-likelihood under the language model conditioned on the question and then returns the option with the highest score as the most probable answer to the question q.
Libraries/Framework: Transformers, Numpy, and Tensorflow