Projects | Mohit

01

Core ML / Deep Learning

Coding LLM From Scratch

PythonPyTorch

What It Is

Built a full large language model from the ground up — no shortcuts, no abstractions, just pure understanding. Every component from tokenizer to pretraining loop was implemented independently.

How It Works

Custom BPE tokenizer, data loader, multi-head self-attention with causal masking, full transformer architecture (LayerNorm, FFN, residuals, positional embeddings), and a complete pretraining loop — all without high-level abstractions.

Key Highlights

Custom BPE tokenizer built from scratch
Multi-head self-attention with causal masking
Full transformer block independently implemented
End-to-end pretraining loop with loss tracking
Zero reliance on HuggingFace Trainer or pre-built blocks

Why It Matters

If you can build it from scratch, you truly understand it. This proves depth of knowledge at the architecture level.

02

Production AI System

NeuroChat Engine

PythonLangGraphSQLiteFAISSStreamlitMCP

What It Is

A production-grade modular chatbot with memory, RAG, streaming, tool-calling, and human-in-the-loop — all in one system. Feels less like a demo and more like a real product.

How It Works

LangGraph orchestrates the entire flow. SQLite handles persistent conversation threads. FAISS powers semantic document retrieval. MCP enables external tool-calling. Streamlit streams responses live. Human-in-the-loop checkpoints pause the agent for user approval before proceeding.

Key Highlights

Threaded conversations with SQLite persistence across sessions
Dual-layer memory: short-term context + long-term semantic store
FAISS RAG pipeline for document-grounded answers
Streaming responses via Streamlit for real-time UX
MCP tool-calling: invoke external APIs from within the agent loop
Human-in-the-loop (HITL) support

Why It Matters

Demonstrates the ability to architect a complete, production-style agentic system — not just a script, but a real modular application.

03

Agentic AI / Compiler Engineering

NLP to App Compiler

PythonLangGraphFastAPIReact (Vite)GroqPydantic

What It Is

Applies compiler architecture to software generation. Give it a plain English description of an app — it runs through four deterministic stages, validates the output against a typed contract, and self-repairs any broken fields before returning the result. Nothing about it is one-shot.

How It Works

The pipeline is a StateGraph (LangGraph) with five registered nodes: intent → design → schema → validate → repair. Every node reads from and writes to a shared AppState TypedDict. add_conditional_edges off the validate node — if Pydantic's ValidationError fires, the exact error gets stored in state["errors"] and routed to the repair node. The repair → validate back-edge is what makes this a real loop, not a retry. A clean_json_output() strips markdown fences before json.loads(). FastAPI exposes POST /api/generate; React + Vite renders results across four tabbed JSON views.

Key Highlights

StateGraph with typed AppState TypedDict through all five nodes — no global state, no side effects between stages
AppSchema(BaseModel) enforces { ui, api, database, auth } — Pydantic catches field-level failures with exact error messages
add_conditional_edges routes to repair or END based on state["validated"] — graph controls flow, not app code
repair → validate back-edge: model sees its own broken output + exact ValidationError, not a generic retry
clean_json_output() handles the markdown fence problem that breaks json.loads() on raw LLM responses
ChatGroq at temperature 0.5 — consistent JSON structure without degenerate outputs

Why It Matters

Most LLM pipelines are glorified prompt chains — they call the model and hope the output is right. This one treats the model like an unreliable compiler pass: run it, validate against a contract, and feed failures back with surgical precision. The repair loop isn't a fallback — it's a first-class part of the architecture.

04

Agentic AI / RAG Systems

Cognitive Routing RAG

PythonLangGraphFAISSRAG

What It Is

A three-phase autonomous AI engine — the cognitive core of a multi-bot social platform. It decides which bots respond to a post, generates original content on a schedule, and defends its persona against prompt injection attacks in live arguments.

How It Works

Phase 1 uses FAISS + cosine similarity to route posts to relevant bots. Phase 2 runs a LangGraph state machine that searches the web and drafts opinionated posts. Phase 3 reconstructs the full argument thread as RAG context and fires back in-character.

Key Highlights

FAISS persona router — cosine similarity decides bot engagement, not brute-force
LangGraph 3-node autonomous pipeline: decide → search → draft
Full-thread RAG engine for context-aware multi-turn replies
Prompt injection hardening — override attempts trigger intensified in-persona responses
Strict JSON output enforcement across all autonomous content

Why It Matters

This isn't a chatbot wrapper — it's a full agentic decision loop. It shows understanding of autonomous systems, vector-based routing, LangGraph orchestration, and production-level prompt security all in one project.

05

Agentic AI / Workflow Engineering

LangGraph Pipelines

PythonLangGraphStreamlitJupyter Notebook

What It Is

A comprehensive reference library of LangGraph workflows covering every major pattern for real-world agent design — basic, sequential, conditional, parallel, subgraph, persistence, RAG, and tool-use pipelines, plus a Streamlit frontend.

How It Works

Each pipeline is a standalone runnable LangGraph graph with typed state management. Conditional edges handle branching. Parallel nodes handle concurrent tasks. Subgraphs enable modular composition. The Streamlit frontend ties them into an accessible UI.

Key Highlights

8+ pipeline patterns covering the full LangGraph design space
Fully typed state management across all graph flows
Subgraph composition for modular, reusable agent blocks
Streamlit frontend for visual interaction
A living reference for real-world agentic AI architecture

Why It Matters

Shows systematic mastery of LangGraph — not just one workflow, but the entire design space explored methodically.

06

Core ML / Architecture Research

Pretraining Gemma-3 SLM

PythonPyTorch

What It Is

Replicated Google's Gemma-3 small language model architecture from scratch in PyTorch to deeply understand its design decisions over earlier transformer models.

How It Works

Studied the Gemma-3 technical report and translated architectural specs into a clean PyTorch implementation — including Grouped-Query Attention (GQA), RoPE positional embeddings, GeGLU activations, and pre-normalization. Full pretraining loop included.

Key Highlights

Full Gemma-3 architecture reproduced from the technical paper
Grouped-Query Attention (GQA) implemented from scratch
RoPE (Rotary Positional Embeddings) for length generalization
GeGLU activations replacing standard FFN activations
Complete pretraining loop with loss monitoring

Why It Matters

Demonstrates the ability to read ML research papers and translate them into working implementations — a critical skill for frontier AI work.

07

LLM Fine-Tuning

Instruction Training Pipeline

PythonPyTorch

What It Is

An end-to-end pipeline for teaching a base language model to follow instructions using supervised fine-tuning — the exact process behind ChatGPT and Claude.

How It Works

Formats datasets into instruction-response pairs. Loss is computed only on response tokens (not the prompt). Training loop handles gradient accumulation, learning rate scheduling, and checkpoint saving.

Key Highlights

Instruction dataset formatting with proper prompt-completion separation
Loss masking — only response tokens contribute to gradient updates
Custom PyTorch training loop with learning rate scheduling
Evaluation metrics for instruction-following quality
End-to-end: raw dataset in, instruction-tuned model out

Why It Matters

Covers the exact pipeline used to convert raw LLMs into assistants — the same process behind the models people use every day.

08

NLP / Transfer Learning

Text Classification Training Pipeline

PythonPyTorch

What It Is

Fine-tuned a pretrained transformer for binary text classification using transfer learning — fully custom PyTorch implementation, no Trainer APIs.

How It Works

Pretrained transformer weights loaded, LM head replaced with a classification head, fine-tuned end-to-end. Custom PyTorch DataLoaders handle batching and tokenization. Cross-entropy loss with per-epoch accuracy tracking.

Key Highlights

Transfer learning: pretrained transformer adapted for classification
Custom classification head replacing the LM head
Full end-to-end PyTorch training — no Trainer abstractions
Custom data loading and batching pipeline
Per-epoch accuracy and loss evaluation

Why It Matters

Showcases the fundamentals of NLP fine-tuning — the same technique powering most production text classifiers today.

09

Full-Stack AI Application

Ask YouTube (RAG Application)

PythonNext.jsReact 19LangChainFAISSGroqHuggingFace

What It Is

A full-stack app that lets you ask natural language questions about any YouTube video. Point it at a video, ask a question, get a grounded answer — powered by transcript-based RAG.

How It Works

Fetches YouTube transcripts, chunks and embeds them using BAAI/bge-small-en into FAISS, then answers questions via a LangChain RAG chain backed by Groq (Llama-3). Next.js 16 + React 19 frontend.

Key Highlights

YouTube transcript ingestion and semantic chunking
BAAI/bge-small-en embeddings for high-quality semantic search
FAISS vector store for fast similarity retrieval
LangChain RAG chain with Groq LLM for sub-second answers
Next.js 16 + React 19 modern frontend
Full-stack: Python backend + TypeScript/React frontend

Why It Matters

A complete RAG product covering the entire pipeline — from ingestion to retrieval to generation to UI.

10

Full-Stack AI Application

fxgenie (LLM-Powered Currency Converter)

PythonFastAPIReact (Vite)LangChainGroqExchangeRate-API

What It Is

Type "10 USD to INR" or "convert 250 Swiss francs to yen" — the LLM parses your intent, fetches live rates, and returns the result. Natural language replaces dropdowns entirely.

How It Works

LangChain routes user query to Groq (Llama-3.3-70b) which extracts structured intent (source currency, target currency, amount). That output queries ExchangeRate-API for live rates. Result returned to React frontend via FastAPI.

Key Highlights

Natural language interface — no dropdowns, just plain English
Groq Llama-3.3-70b for fast, accurate intent extraction
LangChain orchestration between LLM and API calls
Live exchange rates from ExchangeRate-API
FastAPI backend with clean REST endpoints
React + Vite frontend for a snappy UI

Why It Matters

Demonstrates LLM tool-use in a real product context — turning unstructured language into structured API calls.

11

AI Systems / Framework Engineering

GenAI Systems with LangChain

PythonLangChain

What It Is

A suite of modular, production-ready LLM systems built with LangChain — each component designed to be swappable and composable rather than one-off scripts.

How It Works

Built using LCEL (LangChain Expression Language) for composability. Prompts are templated and version-controlled. Output parsers enforce structured responses via Pydantic. Tools registered for agent use. Conversation memory injected into chains.

Key Highlights

LCEL-based composable chain design
Modular prompt templating and output parsing
Agent patterns with tool registration
Conversation memory integration
Structured output enforcement with Pydantic

Why It Matters

Shows LangChain expertise beyond tutorials — production patterns for real applications.

12

Framework Internals / Systems Understanding

Custom Runnable Pipeline

PythonJupyter Notebook

What It Is

Rebuilt LangChain's Runnable abstraction from scratch — no LangChain — to understand exactly how LLM chains work under the hood. A pure Python framework that mirrors LCEL internals.

How It Works

Base Runnable class with invoke method and pipe (|) operator for chaining. Concrete runnables — prompt templates, LLM callers, output parsers — implemented as subclasses. Output of one step feeds as input to the next, exactly mirroring LCEL.

Key Highlights

Runnable base class with invoke and pipe operator from scratch
Custom prompt template, LLM caller, and output parser runnables
Chaining via | operator — mirrors LangChain LCEL internals
Step-by-step tracing of data flow through the pipeline
Pure Python — zero framework dependencies

Why It Matters

Understanding frameworks at the implementation level makes you a better engineer — this project proves that curiosity.

13

Model Engineering / Inference

HuggingFace GPT Weight Mapping

PythonHuggingFacePyTorch

What It Is

Loads pretrained GPT weights from HuggingFace into a locally defined custom architecture — solving the weight mapping problem manually — with temperature scaling and top-k sampling for inference control.

How It Works

HuggingFace state dict iterated and each weight mapped by name and shape to the custom architecture. Mismatched layer names resolved with a mapping dictionary. Temperature scaling + top-k sampling added for flexible generation.

Key Highlights

Manual weight mapping from HuggingFace to custom PyTorch model
Shape verification to catch mismatches before loading
Temperature scaling for controlling output randomness
Top-k sampling for diverse but coherent text generation
Reusable pattern for loading any pretrained weights into custom architectures

Why It Matters

Essential knowledge for anyone customizing or extending pretrained models — a practical deep-dive into model internals.

14

Full-Stack / Real-Time Systems

Quicksync (Real-Time Collaboration App)

TypeScriptNext.jsNode.js

What It Is

A real-time collaboration tool where multiple users can work together with instant synchronization — Mohit's full-stack web project outside AI, showing range as a developer.

How It Works

Next.js + TypeScript frontend. Dedicated Node.js server handles live sync logic using real-time event broadcasting. State changes from any connected client propagate to all others instantly.

Key Highlights

Real-time multi-user synchronization
Dedicated Node.js server for live event handling
Next.js + TypeScript frontend for type-safe, fast UI
Event-driven architecture for instant state propagation
Clean separation: frontend logic vs sync server

Why It Matters

Shows versatility — Mohit isn't just an AI developer, he can build production full-stack web systems too.

Projects & Work

Coding LLM From Scratch

What It Is

How It Works

Key Highlights

Why It Matters

NeuroChat Engine

What It Is

How It Works

Key Highlights

Why It Matters

NLP to App Compiler

What It Is

How It Works

Key Highlights

Why It Matters

Cognitive Routing RAG

What It Is

How It Works

Key Highlights

Why It Matters

LangGraph Pipelines

What It Is

How It Works

Key Highlights

Why It Matters

Pretraining Gemma-3 SLM

What It Is

How It Works

Key Highlights

Why It Matters

Instruction Training Pipeline

What It Is

How It Works

Key Highlights

Why It Matters

Text Classification Training Pipeline

What It Is

How It Works

Key Highlights

Why It Matters

Ask YouTube (RAG Application)

What It Is

How It Works

Key Highlights

Why It Matters

fxgenie (LLM-Powered Currency Converter)

What It Is

How It Works

Key Highlights

Why It Matters

GenAI Systems with LangChain

What It Is

How It Works

Key Highlights

Why It Matters

Custom Runnable Pipeline

What It Is

How It Works

Key Highlights

Why It Matters

HuggingFace GPT Weight Mapping

What It Is

How It Works

Key Highlights

Why It Matters

Quicksync (Real-Time Collaboration App)

What It Is

How It Works

Key Highlights

Why It Matters