Projects & Work

01
Core ML / Deep Learning

Coding LLM From Scratch

PythonPyTorch

What It Is

Built a full large language model from the ground up — no shortcuts, no abstractions, just pure understanding. Every component from tokenizer to pretraining loop was implemented independently.

How It Works

Custom BPE tokenizer, data loader, multi-head self-attention with causal masking, full transformer architecture (LayerNorm, FFN, residuals, positional embeddings), and a complete pretraining loop — all without high-level abstractions.

Key Highlights

  • Custom BPE tokenizer built from scratch
  • Multi-head self-attention with causal masking
  • Full transformer block independently implemented
  • End-to-end pretraining loop with loss tracking
  • Zero reliance on HuggingFace Trainer or pre-built blocks

Why It Matters

If you can build it from scratch, you truly understand it. This proves depth of knowledge at the architecture level.

02
Production AI System

NeuroChat Engine

PythonLangGraphSQLiteFAISSStreamlitMCP

What It Is

A production-grade modular chatbot with memory, RAG, streaming, tool-calling, and human-in-the-loop — all in one system. Feels less like a demo and more like a real product.

How It Works

LangGraph orchestrates the entire flow. SQLite handles persistent conversation threads. FAISS powers semantic document retrieval. MCP enables external tool-calling. Streamlit streams responses live. Human-in-the-loop checkpoints pause the agent for user approval before proceeding.

Key Highlights

  • Threaded conversations with SQLite persistence across sessions
  • Dual-layer memory: short-term context + long-term semantic store
  • FAISS RAG pipeline for document-grounded answers
  • Streaming responses via Streamlit for real-time UX
  • MCP tool-calling: invoke external APIs from within the agent loop
  • Human-in-the-loop (HITL) support

Why It Matters

Demonstrates the ability to architect a complete, production-style agentic system — not just a script, but a real modular application.

03
Agentic AI / Compiler Engineering

NLP to App Compiler

PythonLangGraphFastAPIReact (Vite)GroqPydantic

What It Is

Applies compiler architecture to software generation. Give it a plain English description of an app — it runs through four deterministic stages, validates the output against a typed contract, and self-repairs any broken fields before returning the result. Nothing about it is one-shot.

How It Works

The pipeline is a StateGraph (LangGraph) with five registered nodes: intent → design → schema → validate → repair. Every node reads from and writes to a shared AppState TypedDict. add_conditional_edges off the validate node — if Pydantic's ValidationError fires, the exact error gets stored in state["errors"] and routed to the repair node. The repair → validate back-edge is what makes this a real loop, not a retry. A clean_json_output() strips markdown fences before json.loads(). FastAPI exposes POST /api/generate; React + Vite renders results across four tabbed JSON views.

Key Highlights

  • StateGraph with typed AppState TypedDict through all five nodes — no global state, no side effects between stages
  • AppSchema(BaseModel) enforces { ui, api, database, auth } — Pydantic catches field-level failures with exact error messages
  • add_conditional_edges routes to repair or END based on state["validated"] — graph controls flow, not app code
  • repair → validate back-edge: model sees its own broken output + exact ValidationError, not a generic retry
  • clean_json_output() handles the markdown fence problem that breaks json.loads() on raw LLM responses
  • ChatGroq at temperature 0.5 — consistent JSON structure without degenerate outputs

Why It Matters

Most LLM pipelines are glorified prompt chains — they call the model and hope the output is right. This one treats the model like an unreliable compiler pass: run it, validate against a contract, and feed failures back with surgical precision. The repair loop isn't a fallback — it's a first-class part of the architecture.

04
Agentic AI / RAG Systems

Cognitive Routing RAG

PythonLangGraphFAISSRAG

What It Is

A three-phase autonomous AI engine — the cognitive core of a multi-bot social platform. It decides which bots respond to a post, generates original content on a schedule, and defends its persona against prompt injection attacks in live arguments.

How It Works

Phase 1 uses FAISS + cosine similarity to route posts to relevant bots. Phase 2 runs a LangGraph state machine that searches the web and drafts opinionated posts. Phase 3 reconstructs the full argument thread as RAG context and fires back in-character.

Key Highlights

  • FAISS persona router — cosine similarity decides bot engagement, not brute-force
  • LangGraph 3-node autonomous pipeline: decide → search → draft
  • Full-thread RAG engine for context-aware multi-turn replies
  • Prompt injection hardening — override attempts trigger intensified in-persona responses
  • Strict JSON output enforcement across all autonomous content

Why It Matters

This isn't a chatbot wrapper — it's a full agentic decision loop. It shows understanding of autonomous systems, vector-based routing, LangGraph orchestration, and production-level prompt security all in one project.

05
Agentic AI / Workflow Engineering

LangGraph Pipelines

PythonLangGraphStreamlitJupyter Notebook

What It Is

A comprehensive reference library of LangGraph workflows covering every major pattern for real-world agent design — basic, sequential, conditional, parallel, subgraph, persistence, RAG, and tool-use pipelines, plus a Streamlit frontend.

How It Works

Each pipeline is a standalone runnable LangGraph graph with typed state management. Conditional edges handle branching. Parallel nodes handle concurrent tasks. Subgraphs enable modular composition. The Streamlit frontend ties them into an accessible UI.

Key Highlights

  • 8+ pipeline patterns covering the full LangGraph design space
  • Fully typed state management across all graph flows
  • Subgraph composition for modular, reusable agent blocks
  • Streamlit frontend for visual interaction
  • A living reference for real-world agentic AI architecture

Why It Matters

Shows systematic mastery of LangGraph — not just one workflow, but the entire design space explored methodically.

06
Core ML / Architecture Research

Pretraining Gemma-3 SLM

PythonPyTorch

What It Is

Replicated Google's Gemma-3 small language model architecture from scratch in PyTorch to deeply understand its design decisions over earlier transformer models.

How It Works

Studied the Gemma-3 technical report and translated architectural specs into a clean PyTorch implementation — including Grouped-Query Attention (GQA), RoPE positional embeddings, GeGLU activations, and pre-normalization. Full pretraining loop included.

Key Highlights

  • Full Gemma-3 architecture reproduced from the technical paper
  • Grouped-Query Attention (GQA) implemented from scratch
  • RoPE (Rotary Positional Embeddings) for length generalization
  • GeGLU activations replacing standard FFN activations
  • Complete pretraining loop with loss monitoring

Why It Matters

Demonstrates the ability to read ML research papers and translate them into working implementations — a critical skill for frontier AI work.

07
LLM Fine-Tuning

Instruction Training Pipeline

PythonPyTorch

What It Is

An end-to-end pipeline for teaching a base language model to follow instructions using supervised fine-tuning — the exact process behind ChatGPT and Claude.

How It Works

Formats datasets into instruction-response pairs. Loss is computed only on response tokens (not the prompt). Training loop handles gradient accumulation, learning rate scheduling, and checkpoint saving.

Key Highlights

  • Instruction dataset formatting with proper prompt-completion separation
  • Loss masking — only response tokens contribute to gradient updates
  • Custom PyTorch training loop with learning rate scheduling
  • Evaluation metrics for instruction-following quality
  • End-to-end: raw dataset in, instruction-tuned model out

Why It Matters

Covers the exact pipeline used to convert raw LLMs into assistants — the same process behind the models people use every day.

08
NLP / Transfer Learning

Text Classification Training Pipeline

PythonPyTorch

What It Is

Fine-tuned a pretrained transformer for binary text classification using transfer learning — fully custom PyTorch implementation, no Trainer APIs.

How It Works

Pretrained transformer weights loaded, LM head replaced with a classification head, fine-tuned end-to-end. Custom PyTorch DataLoaders handle batching and tokenization. Cross-entropy loss with per-epoch accuracy tracking.

Key Highlights

  • Transfer learning: pretrained transformer adapted for classification
  • Custom classification head replacing the LM head
  • Full end-to-end PyTorch training — no Trainer abstractions
  • Custom data loading and batching pipeline
  • Per-epoch accuracy and loss evaluation

Why It Matters

Showcases the fundamentals of NLP fine-tuning — the same technique powering most production text classifiers today.

09
Full-Stack AI Application

Ask YouTube (RAG Application)

PythonNext.jsReact 19LangChainFAISSGroqHuggingFace

What It Is

A full-stack app that lets you ask natural language questions about any YouTube video. Point it at a video, ask a question, get a grounded answer — powered by transcript-based RAG.

How It Works

Fetches YouTube transcripts, chunks and embeds them using BAAI/bge-small-en into FAISS, then answers questions via a LangChain RAG chain backed by Groq (Llama-3). Next.js 16 + React 19 frontend.

Key Highlights

  • YouTube transcript ingestion and semantic chunking
  • BAAI/bge-small-en embeddings for high-quality semantic search
  • FAISS vector store for fast similarity retrieval
  • LangChain RAG chain with Groq LLM for sub-second answers
  • Next.js 16 + React 19 modern frontend
  • Full-stack: Python backend + TypeScript/React frontend

Why It Matters

A complete RAG product covering the entire pipeline — from ingestion to retrieval to generation to UI.

10
Full-Stack AI Application

fxgenie (LLM-Powered Currency Converter)

PythonFastAPIReact (Vite)LangChainGroqExchangeRate-API

What It Is

Type "10 USD to INR" or "convert 250 Swiss francs to yen" — the LLM parses your intent, fetches live rates, and returns the result. Natural language replaces dropdowns entirely.

How It Works

LangChain routes user query to Groq (Llama-3.3-70b) which extracts structured intent (source currency, target currency, amount). That output queries ExchangeRate-API for live rates. Result returned to React frontend via FastAPI.

Key Highlights

  • Natural language interface — no dropdowns, just plain English
  • Groq Llama-3.3-70b for fast, accurate intent extraction
  • LangChain orchestration between LLM and API calls
  • Live exchange rates from ExchangeRate-API
  • FastAPI backend with clean REST endpoints
  • React + Vite frontend for a snappy UI

Why It Matters

Demonstrates LLM tool-use in a real product context — turning unstructured language into structured API calls.

11
AI Systems / Framework Engineering

GenAI Systems with LangChain

PythonLangChain

What It Is

A suite of modular, production-ready LLM systems built with LangChain — each component designed to be swappable and composable rather than one-off scripts.

How It Works

Built using LCEL (LangChain Expression Language) for composability. Prompts are templated and version-controlled. Output parsers enforce structured responses via Pydantic. Tools registered for agent use. Conversation memory injected into chains.

Key Highlights

  • LCEL-based composable chain design
  • Modular prompt templating and output parsing
  • Agent patterns with tool registration
  • Conversation memory integration
  • Structured output enforcement with Pydantic

Why It Matters

Shows LangChain expertise beyond tutorials — production patterns for real applications.

12
Framework Internals / Systems Understanding

Custom Runnable Pipeline

PythonJupyter Notebook

What It Is

Rebuilt LangChain's Runnable abstraction from scratch — no LangChain — to understand exactly how LLM chains work under the hood. A pure Python framework that mirrors LCEL internals.

How It Works

Base Runnable class with invoke method and pipe (|) operator for chaining. Concrete runnables — prompt templates, LLM callers, output parsers — implemented as subclasses. Output of one step feeds as input to the next, exactly mirroring LCEL.

Key Highlights

  • Runnable base class with invoke and pipe operator from scratch
  • Custom prompt template, LLM caller, and output parser runnables
  • Chaining via | operator — mirrors LangChain LCEL internals
  • Step-by-step tracing of data flow through the pipeline
  • Pure Python — zero framework dependencies

Why It Matters

Understanding frameworks at the implementation level makes you a better engineer — this project proves that curiosity.

13
Model Engineering / Inference

HuggingFace GPT Weight Mapping

PythonHuggingFacePyTorch

What It Is

Loads pretrained GPT weights from HuggingFace into a locally defined custom architecture — solving the weight mapping problem manually — with temperature scaling and top-k sampling for inference control.

How It Works

HuggingFace state dict iterated and each weight mapped by name and shape to the custom architecture. Mismatched layer names resolved with a mapping dictionary. Temperature scaling + top-k sampling added for flexible generation.

Key Highlights

  • Manual weight mapping from HuggingFace to custom PyTorch model
  • Shape verification to catch mismatches before loading
  • Temperature scaling for controlling output randomness
  • Top-k sampling for diverse but coherent text generation
  • Reusable pattern for loading any pretrained weights into custom architectures

Why It Matters

Essential knowledge for anyone customizing or extending pretrained models — a practical deep-dive into model internals.

14
Full-Stack / Real-Time Systems

Quicksync (Real-Time Collaboration App)

TypeScriptNext.jsNode.js

What It Is

A real-time collaboration tool where multiple users can work together with instant synchronization — Mohit's full-stack web project outside AI, showing range as a developer.

How It Works

Next.js + TypeScript frontend. Dedicated Node.js server handles live sync logic using real-time event broadcasting. State changes from any connected client propagate to all others instantly.

Key Highlights

  • Real-time multi-user synchronization
  • Dedicated Node.js server for live event handling
  • Next.js + TypeScript frontend for type-safe, fast UI
  • Event-driven architecture for instant state propagation
  • Clean separation: frontend logic vs sync server

Why It Matters

Shows versatility — Mohit isn't just an AI developer, he can build production full-stack web systems too.