Projects & Work
Coding LLM From Scratch
What It Is
Built a full large language model from the ground up — no shortcuts, no abstractions, just pure understanding. Every component from tokenizer to pretraining loop was implemented independently.
How It Works
Custom BPE tokenizer, data loader, multi-head self-attention with causal masking, full transformer architecture (LayerNorm, FFN, residuals, positional embeddings), and a complete pretraining loop — all without high-level abstractions.
Key Highlights
- Custom BPE tokenizer built from scratch
- Multi-head self-attention with causal masking
- Full transformer block independently implemented
- End-to-end pretraining loop with loss tracking
- Zero reliance on HuggingFace Trainer or pre-built blocks
Why It Matters
If you can build it from scratch, you truly understand it. This proves depth of knowledge at the architecture level.
NeuroChat Engine
What It Is
A production-grade modular chatbot with memory, RAG, streaming, tool-calling, and human-in-the-loop — all in one system. Feels less like a demo and more like a real product.
How It Works
LangGraph orchestrates the entire flow. SQLite handles persistent conversation threads. FAISS powers semantic document retrieval. MCP enables external tool-calling. Streamlit streams responses live. Human-in-the-loop checkpoints pause the agent for user approval before proceeding.
Key Highlights
- Threaded conversations with SQLite persistence across sessions
- Dual-layer memory: short-term context + long-term semantic store
- FAISS RAG pipeline for document-grounded answers
- Streaming responses via Streamlit for real-time UX
- MCP tool-calling: invoke external APIs from within the agent loop
- Human-in-the-loop (HITL) support
Why It Matters
Demonstrates the ability to architect a complete, production-style agentic system — not just a script, but a real modular application.
NLP to App Compiler
What It Is
Applies compiler architecture to software generation. Give it a plain English description of an app — it runs through four deterministic stages, validates the output against a typed contract, and self-repairs any broken fields before returning the result. Nothing about it is one-shot.
How It Works
The pipeline is a StateGraph (LangGraph) with five registered nodes: intent → design → schema → validate → repair. Every node reads from and writes to a shared AppState TypedDict. add_conditional_edges off the validate node — if Pydantic's ValidationError fires, the exact error gets stored in state["errors"] and routed to the repair node. The repair → validate back-edge is what makes this a real loop, not a retry. A clean_json_output() strips markdown fences before json.loads(). FastAPI exposes POST /api/generate; React + Vite renders results across four tabbed JSON views.
Key Highlights
- StateGraph with typed AppState TypedDict through all five nodes — no global state, no side effects between stages
- AppSchema(BaseModel) enforces { ui, api, database, auth } — Pydantic catches field-level failures with exact error messages
- add_conditional_edges routes to repair or END based on state["validated"] — graph controls flow, not app code
- repair → validate back-edge: model sees its own broken output + exact ValidationError, not a generic retry
- clean_json_output() handles the markdown fence problem that breaks json.loads() on raw LLM responses
- ChatGroq at temperature 0.5 — consistent JSON structure without degenerate outputs
Why It Matters
Most LLM pipelines are glorified prompt chains — they call the model and hope the output is right. This one treats the model like an unreliable compiler pass: run it, validate against a contract, and feed failures back with surgical precision. The repair loop isn't a fallback — it's a first-class part of the architecture.
Cognitive Routing RAG
What It Is
A three-phase autonomous AI engine — the cognitive core of a multi-bot social platform. It decides which bots respond to a post, generates original content on a schedule, and defends its persona against prompt injection attacks in live arguments.
How It Works
Phase 1 uses FAISS + cosine similarity to route posts to relevant bots. Phase 2 runs a LangGraph state machine that searches the web and drafts opinionated posts. Phase 3 reconstructs the full argument thread as RAG context and fires back in-character.
Key Highlights
- FAISS persona router — cosine similarity decides bot engagement, not brute-force
- LangGraph 3-node autonomous pipeline: decide → search → draft
- Full-thread RAG engine for context-aware multi-turn replies
- Prompt injection hardening — override attempts trigger intensified in-persona responses
- Strict JSON output enforcement across all autonomous content
Why It Matters
This isn't a chatbot wrapper — it's a full agentic decision loop. It shows understanding of autonomous systems, vector-based routing, LangGraph orchestration, and production-level prompt security all in one project.
LangGraph Pipelines
What It Is
A comprehensive reference library of LangGraph workflows covering every major pattern for real-world agent design — basic, sequential, conditional, parallel, subgraph, persistence, RAG, and tool-use pipelines, plus a Streamlit frontend.
How It Works
Each pipeline is a standalone runnable LangGraph graph with typed state management. Conditional edges handle branching. Parallel nodes handle concurrent tasks. Subgraphs enable modular composition. The Streamlit frontend ties them into an accessible UI.
Key Highlights
- 8+ pipeline patterns covering the full LangGraph design space
- Fully typed state management across all graph flows
- Subgraph composition for modular, reusable agent blocks
- Streamlit frontend for visual interaction
- A living reference for real-world agentic AI architecture
Why It Matters
Shows systematic mastery of LangGraph — not just one workflow, but the entire design space explored methodically.
Pretraining Gemma-3 SLM
What It Is
Replicated Google's Gemma-3 small language model architecture from scratch in PyTorch to deeply understand its design decisions over earlier transformer models.
How It Works
Studied the Gemma-3 technical report and translated architectural specs into a clean PyTorch implementation — including Grouped-Query Attention (GQA), RoPE positional embeddings, GeGLU activations, and pre-normalization. Full pretraining loop included.
Key Highlights
- Full Gemma-3 architecture reproduced from the technical paper
- Grouped-Query Attention (GQA) implemented from scratch
- RoPE (Rotary Positional Embeddings) for length generalization
- GeGLU activations replacing standard FFN activations
- Complete pretraining loop with loss monitoring
Why It Matters
Demonstrates the ability to read ML research papers and translate them into working implementations — a critical skill for frontier AI work.
Instruction Training Pipeline
What It Is
An end-to-end pipeline for teaching a base language model to follow instructions using supervised fine-tuning — the exact process behind ChatGPT and Claude.
How It Works
Formats datasets into instruction-response pairs. Loss is computed only on response tokens (not the prompt). Training loop handles gradient accumulation, learning rate scheduling, and checkpoint saving.
Key Highlights
- Instruction dataset formatting with proper prompt-completion separation
- Loss masking — only response tokens contribute to gradient updates
- Custom PyTorch training loop with learning rate scheduling
- Evaluation metrics for instruction-following quality
- End-to-end: raw dataset in, instruction-tuned model out
Why It Matters
Covers the exact pipeline used to convert raw LLMs into assistants — the same process behind the models people use every day.
Text Classification Training Pipeline
What It Is
Fine-tuned a pretrained transformer for binary text classification using transfer learning — fully custom PyTorch implementation, no Trainer APIs.
How It Works
Pretrained transformer weights loaded, LM head replaced with a classification head, fine-tuned end-to-end. Custom PyTorch DataLoaders handle batching and tokenization. Cross-entropy loss with per-epoch accuracy tracking.
Key Highlights
- Transfer learning: pretrained transformer adapted for classification
- Custom classification head replacing the LM head
- Full end-to-end PyTorch training — no Trainer abstractions
- Custom data loading and batching pipeline
- Per-epoch accuracy and loss evaluation
Why It Matters
Showcases the fundamentals of NLP fine-tuning — the same technique powering most production text classifiers today.
Ask YouTube (RAG Application)
What It Is
A full-stack app that lets you ask natural language questions about any YouTube video. Point it at a video, ask a question, get a grounded answer — powered by transcript-based RAG.
How It Works
Fetches YouTube transcripts, chunks and embeds them using BAAI/bge-small-en into FAISS, then answers questions via a LangChain RAG chain backed by Groq (Llama-3). Next.js 16 + React 19 frontend.
Key Highlights
- YouTube transcript ingestion and semantic chunking
- BAAI/bge-small-en embeddings for high-quality semantic search
- FAISS vector store for fast similarity retrieval
- LangChain RAG chain with Groq LLM for sub-second answers
- Next.js 16 + React 19 modern frontend
- Full-stack: Python backend + TypeScript/React frontend
Why It Matters
A complete RAG product covering the entire pipeline — from ingestion to retrieval to generation to UI.
fxgenie (LLM-Powered Currency Converter)
What It Is
Type "10 USD to INR" or "convert 250 Swiss francs to yen" — the LLM parses your intent, fetches live rates, and returns the result. Natural language replaces dropdowns entirely.
How It Works
LangChain routes user query to Groq (Llama-3.3-70b) which extracts structured intent (source currency, target currency, amount). That output queries ExchangeRate-API for live rates. Result returned to React frontend via FastAPI.
Key Highlights
- Natural language interface — no dropdowns, just plain English
- Groq Llama-3.3-70b for fast, accurate intent extraction
- LangChain orchestration between LLM and API calls
- Live exchange rates from ExchangeRate-API
- FastAPI backend with clean REST endpoints
- React + Vite frontend for a snappy UI
Why It Matters
Demonstrates LLM tool-use in a real product context — turning unstructured language into structured API calls.
GenAI Systems with LangChain
What It Is
A suite of modular, production-ready LLM systems built with LangChain — each component designed to be swappable and composable rather than one-off scripts.
How It Works
Built using LCEL (LangChain Expression Language) for composability. Prompts are templated and version-controlled. Output parsers enforce structured responses via Pydantic. Tools registered for agent use. Conversation memory injected into chains.
Key Highlights
- LCEL-based composable chain design
- Modular prompt templating and output parsing
- Agent patterns with tool registration
- Conversation memory integration
- Structured output enforcement with Pydantic
Why It Matters
Shows LangChain expertise beyond tutorials — production patterns for real applications.
Custom Runnable Pipeline
What It Is
Rebuilt LangChain's Runnable abstraction from scratch — no LangChain — to understand exactly how LLM chains work under the hood. A pure Python framework that mirrors LCEL internals.
How It Works
Base Runnable class with invoke method and pipe (|) operator for chaining. Concrete runnables — prompt templates, LLM callers, output parsers — implemented as subclasses. Output of one step feeds as input to the next, exactly mirroring LCEL.
Key Highlights
- Runnable base class with invoke and pipe operator from scratch
- Custom prompt template, LLM caller, and output parser runnables
- Chaining via | operator — mirrors LangChain LCEL internals
- Step-by-step tracing of data flow through the pipeline
- Pure Python — zero framework dependencies
Why It Matters
Understanding frameworks at the implementation level makes you a better engineer — this project proves that curiosity.
HuggingFace GPT Weight Mapping
What It Is
Loads pretrained GPT weights from HuggingFace into a locally defined custom architecture — solving the weight mapping problem manually — with temperature scaling and top-k sampling for inference control.
How It Works
HuggingFace state dict iterated and each weight mapped by name and shape to the custom architecture. Mismatched layer names resolved with a mapping dictionary. Temperature scaling + top-k sampling added for flexible generation.
Key Highlights
- Manual weight mapping from HuggingFace to custom PyTorch model
- Shape verification to catch mismatches before loading
- Temperature scaling for controlling output randomness
- Top-k sampling for diverse but coherent text generation
- Reusable pattern for loading any pretrained weights into custom architectures
Why It Matters
Essential knowledge for anyone customizing or extending pretrained models — a practical deep-dive into model internals.
Quicksync (Real-Time Collaboration App)
What It Is
A real-time collaboration tool where multiple users can work together with instant synchronization — Mohit's full-stack web project outside AI, showing range as a developer.
How It Works
Next.js + TypeScript frontend. Dedicated Node.js server handles live sync logic using real-time event broadcasting. State changes from any connected client propagate to all others instantly.
Key Highlights
- Real-time multi-user synchronization
- Dedicated Node.js server for live event handling
- Next.js + TypeScript frontend for type-safe, fast UI
- Event-driven architecture for instant state propagation
- Clean separation: frontend logic vs sync server
Why It Matters
Shows versatility — Mohit isn't just an AI developer, he can build production full-stack web systems too.