Slicing open the avocado…

Activestarted 2025-11-01·last updated 2026-04-23

itsjaya — AI Portfolio

AI-powered personal portfolio with a RAG chatbot (Avocado), MDX blog, engagement analytics, and a fully automated deploy pipeline. Every layer is production-grade — not a toy.

Next.js 16FastAPIChromaDBGemini 2.5 FlashBM25Cross-encoderSQLiteRailwayGitHub PagesGitHub Actions

Overview

Most portfolios are static pages someone scrolls past in 30 seconds. This one starts a conversation.

Avocado is an AI assistant backed by a hybrid RAG pipeline. It answers questions about experience, projects, skills, and blog posts in real time — with tokens streaming directly to the browser. Ask it "what's your strongest AI project?" and it retrieves the most relevant knowledge chunks, reranks them with a cross-encoder, and streams a grounded answer through Gemini.

The portfolio half is a full static Next.js site — experience, education, projects, and a blog with per-post views and claps tracked in SQLite. The whole system auto-deploys on every git push with zero manual steps.

Why build it this way

A traditional portfolio is a one-way broadcast. You decide what to highlight, the visitor reads what you chose, end of story. The problem is that the actual question someone wants answered — "does this person have experience with distributed caching?" — almost never aligns with the section you happened to emphasise.

An AI-backed portfolio inverts this. The visitor asks in natural language, the system surfaces the most relevant evidence, and Gemini synthesises an answer grounded in real data. The net effect is that a 30-second scroll becomes a conversation that can go as deep as the person wants. The knowledge base is the resume — the chatbot is just the interface.

The engineering goal was to make that conversational layer production-quality, not a toy demo. Streaming tokens instead of waiting for a full response, hybrid retrieval instead of pure cosine similarity, a fallback chain for model unavailability — each of these matters when someone is actually using the thing.

Active

System Architecture

architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                            CLIENT BROWSER                                │
│                                                                          │
│  ┌─────────────────────────────┐   ┌──────────────────────────────────┐  │
│  │       Avocado Chatbot       │   │      Portfolio + Blog + Lab      │  │
│  │   /  (full-screen)          │   │  /portfolio  /experience         │  │
│  │   /chat  (nav-accessible)   │   │  /education  /projects           │  │
│  │                             │   │  /blog  /blog/[slug]             │  │
│  │  ChatInterface              │   │  /lab   /lab/[slug]              │  │
│  │  ChatMessage (md renderer)  │   │                                  │  │
│  │  Model badge  ·  Stats      │   │  BlogPostList  BlogEngagement    │  │
│  │  SSE ReadableStream         │   │  BlogIndexStats  BlogGuideDrawer │  │
│  └──────────────┬──────────────┘   └──────────────────┬───────────────┘  │
└─────────────────┼────────────────────────────────────────┼───────────────┘
                  │  HTTPS + SSE                           │  HTTPS REST
                  ▼                                        ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                     FastAPI Backend  (Railway)                           │
│                                                                          │
│  POST /ai/chat/stream  ──►  RAG pipeline  ──►  Gemini SSE               │
│  POST /ai/chat         ──►  RAG pipeline  ──►  Gemini sync              │
│  POST /blog/{slug}/view  ──►  unique view per IP                        │
│  POST /blog/{slug}/clap  ──►  cumulative claps (max 50 / user / post)   │
│  GET  /blog/{slug}/stats                                                 │
│  GET  /blog/stats/summary                                                │
│  GET  /stats              total_responses · unique_visitors              │
│  GET  /stats/overview     7d / 30d / 1y / all-time for all metrics      │
│  GET  /health                                                            │
│                                                                          │
│  ┌────────────────────────────┐   ┌──────────────────────────────────┐  │
│  │        RAG Store           │   │  SQLite  analytics.db            │  │
│  │                            │   │  (Railway persistent volume)     │  │
│  │  ChromaDB  PersistentClient│   │                                  │  │
│  │  HNSW cosine similarity    │   │  interactions                    │  │
│  │  all-MiniLM-L6-v2 embed    │   │  ├── ip_hash  TEXT              │  │
│  │  LRU cache (256 entries)   │   │  └── created_at  TIMESTAMP      │  │
│  │                            │   │                                  │  │
│  │  BM25Okapi  (rank_bm25)    │   │  blog_views                     │  │
│  │  in-memory, rebuilt every  │   │  ├── slug  TEXT                 │  │
│  │  startup from docs         │   │  ├── ip_hash  TEXT              │  │
│  │                            │   │  └── created_at  TIMESTAMP      │  │
│  │  CrossEncoder              │   │  UNIQUE(slug, ip_hash)          │  │
│  │  ms-marco-MiniLM-L-6-v2    │   │                                  │  │
│  │  pre-warmed at startup     │   │  blog_claps                     │  │
│  │                            │   │  ├── slug  TEXT                 │  │
│  │  RRF merge  k=60           │   │  ├── ip_hash  TEXT              │  │
│  └────────────────────────────┘   │  ├── count  INTEGER             │  │
│                                   │  └── updated_at  TIMESTAMP      │  │
│  ┌──────────────────────────────────────────────────────────────┐   │  │
│  │  Knowledge Base  backend/data/knowledge/                     │   │  │
│  │  profile.json · experience.json · education.json             │   │  │
│  │  projects.json · skills.json · testimonials.json             │   │  │
│  │  blog.json  (auto-generated from MDX on every push)          │   │  │
│  └──────────────────────────────────────────────────────────────┘   │  │
└──────────────────────────────────────────────────────────────────────────┘
                  │
                  │  Google AI API  (HTTPS)
                  ▼
┌─────────────────────────────────────┐
│  Gemini 2.5 Flash  (primary)        │
│  Gemini 2.0 Flash  (fallback 1)     │
│  Gemini 2.0 Flash Lite  (fallback 2)│
│  Gemini Flash Latest  (fallback 3)  │
│  auto-retry on 503 / 429            │
└─────────────────────────────────────┘

RAG Pipeline — Deep Dive

The retrieval pipeline runs before every Gemini call. Four stages in sequence:

architecture

User message: "what's your strongest AI project?"
        │
        ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 1 — Query Expansion                               │
│                                                           │
│  Goal: generate multiple angles so narrow phrasing       │
│  does not miss relevant chunks.                          │
│                                                           │
│  Query 1 (verbatim):                                      │
│    "what's your strongest AI project?"                    │
│                                                           │
│  Query 2 (name-anchored):                                 │
│    "what's your strongest AI project? Jaya Sabarish       │
│     Reddy Remala"                                         │
│                                                           │
│  Query 3 (topic keyword — detected: project/built):       │
│    "projects built SnapLog CodeCollab Multi-Agent         │
│     GeneCart"                                             │
│                                                           │
│  Query 4 (conversation context — last user turn):         │
│    injected only if prior message exists and differs      │
│                                                           │
│  Result: up to 4 query strings                            │
└──────────────────────┬────────────────────────────────────┘
                       │ 4 queries
          ┌────────────┴─────────────┐
          ▼                          ▼
┌──────────────────────┐   ┌──────────────────────────────┐
│  STAGE 2a — DENSE    │   │  STAGE 2b — LEXICAL (BM25)   │
│                      │   │                              │
│  One batched encode  │   │  BM25Okapi scoring           │
│  call for all 4      │   │  Tokenize: lowercase,        │
│  queries —           │   │  strip punctuation,          │
│  single forward pass │   │  keep len > 1 tokens         │
│  ~160ms vs ~400ms    │   │                              │
│  for serial calls    │   │  Catches exact matches:      │
│                      │   │  "3000 RPS", "SnapLog",      │
│  HNSW cosine search  │   │  "Qualcomm", "78%",          │
│  in ChromaDB         │   │  "115 GB/day"                │
│                      │   │                              │
│  top 6 per query     │   │  top 15 results              │
│  = up to 24 chunks   │   │                              │
│  (deduped by id)     │   │  In-memory, rebuilt on       │
└──────────┬───────────┘   │  every startup from docs     │
           └───────┬────────┘
                   ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 3 — Reciprocal Rank Fusion                        │
│                                                           │
│  score(doc) = sum of  1 / (k + rank_i)    k = 60        │
│  Cormack, Clarke, Buettcher 2009                         │
│                                                           │
│  Chunks in both dense + BM25 results get boosted.        │
│  Dense metadata takes precedence on conflicts.           │
│                                                           │
│  Output: up to 20 candidates ranked by RRF score         │
└──────────────────────┬────────────────────────────────────┘
                       ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 4 — Cross-Encoder Rerank                          │
│                                                           │
│  Model: cross-encoder/ms-marco-MiniLM-L-6-v2             │
│                                                           │
│  Scores (query, passage) pairs jointly — query and       │
│  passage attend to each other. Far better relevance      │
│  than bi-encoder cosine, but O(n) so only runs on        │
│  the top 20 post-RRF candidates (~30ms).                 │
│                                                           │
│  Falls back to RRF order instantly if not warmed up.     │
│                                                           │
│  Output: top 5 chunks injected into Gemini context       │
└───────────────────────────────────────────────────────────┘

Why this retrieval stack

The failure mode of pure dense search is subtle: it generalises. The embedding model learned that "high-throughput ingestion system" and "115 GB/day pipeline" are semantically similar concepts — but a recruiter who types "115 GB/day" is looking for that exact number, not a paraphrase of it. Dense search ranks semantic neighbours above the exact match. BM25 does the opposite: it rewards term frequency and document frequency, so specific numbers, project names, and company names surface immediately.

The hybrid catches both failure modes. Dense handles paraphrasing ("what's your strongest AI work?" → SnapLog, Multi-Agent Research Engine). BM25 handles exact recall ("Qualcomm hackathon", "3000 RPS", "NYU", "78% accuracy"). RRF merges the two ranked lists without requiring any tuning — k=60 is the proven default from the original Cormack, Clarke, Buettcher 2009 paper and it works well here.

The cross-encoder at the end addresses a different problem: cosine similarity in embedding space is a coarse relevance signal. Two chunks might score identically in cosine similarity but one actually answers the question and the other just happens to share vocabulary. The cross-encoder reads query and passage jointly — full self-attention between them — and produces a calibrated relevance score. The cost is O(n) inference per query, which is why it only runs on the 20 post-RRF candidates, not all 75+ documents. 20 pairs take about 30ms; 75 pairs would push past 100ms.

Knowledge Base Design

Atomic chunking: each bullet point, project, and skill category gets its own document. Large blobs hurt precision because one chunk about Shell PLC also contains irrelevant AWS details.

| Document type | Count | Source | Strategy | |---|---|---|---| | profile | 3 | profile.json | Overview, bio, contact — separate so contact queries don't surface bio text | | experience | ~25 | experience.json | 1 overview per role + 1 doc per bullet point | | education | ~10 | education.json | 1 overview per degree + 1 doc per highlight | | project | ~14 | projects.json | 1 overview + 1 tech-stack doc per project | | skills | 8 | skills.json | 1 per category + 1 aggregated | | faq | 12 | Hard-coded | Pre-answers common questions about contact, hire, strengths, resume | | blog | varies | blog.json (auto) | 1 per post — title + description + first 2,000 chars of body |

Total: ~70–80 documents. ChromaDB handles this trivially; the retrieval quality comes from chunking strategy and the hybrid pipeline, not scale.

Why atomic chunking

The earlier version chunked each work role into a single large document. A query about "Redis caching" would return the full NYU IT Systems role — which mentions Redis once buried among six unrelated bullets about Prometheus dashboards, CI/CD pipelines, and Docker migrations. The entire role document scored high on Redis similarity, but 80% of the returned text was noise that crowded out the Gemini context window.

Splitting each bullet point into its own document means a Redis query returns only the Redis-specific chunk. The context Gemini receives is dense signal, not diluted noise. The tradeoff is more ChromaDB documents (~75 vs ~15), but at this scale that's irrelevant — ChromaDB handles millions of documents and the retrieval cost difference is negligible.

FAQ documents and what they fix

RAG retrieval is only as good as what is retrievable. The raw experience and project documents are factual, but they are not shaped to answer the questions a visitor actually asks. "What makes Jaya stand out?" returns experience bullet points — true, but not a direct answer. "How do I hire him?" has no document that answers it at all.

The 12 hard-coded FAQ documents are pre-written answers to the highest-probability recruiter questions. They contain exact numbers, direct claims, and contact information. When a relevant FAQ is retrieved, Gemini gets a document that already says "his strongest project is SnapLog, which won the Qualcomm Edge AI Hackathon and runs at 15ms on an NPU" — not a list of bullets it has to synthesise from. These 12 documents have more impact on response quality than any retrieval pipeline improvement.

Hash-based re-ingest and what it protects

On startup, run_ingest() computes SHA-256 of all JSON files and compares against a stored hash file. The hash protects against two failure modes that would otherwise operate silently.

Without the hash check, always re-ingest: Every Railway deploy — even a one-line CSS fix with no knowledge changes — triggers a 15–20s ChromaDB rebuild. On Railway's free plan, cold starts are already slow. Adding an unnecessary re-ingest on every deploy would make the first response after a deploy take 30+ seconds.

Without the hash check, skip if non-empty: If you just check collection.count() > 0 and skip, knowledge updates never take effect. You could update profile.json with a new role, push, Railway redeploys — but the chatbot keeps answering with the old data because the collection is non-empty.

The SHA-256 fingerprint of the entire data/knowledge/ directory solves both: fast startup when nothing changed, automatic re-ingest when anything changed.

architecture

startup
  │
  ├── compute SHA256 of all *.json in data/knowledge/
  ├── read stored hash from chroma_db/.ingest_hash
  │
  ├── hash unchanged AND ChromaDB not empty
  │     └── skip ingest, rebuild BM25 in-memory, done
  │
  └── hash changed OR DB empty
        ├── delete ChromaDB collection
        ├── build_documents() — ~75 (id, text, type) tuples
        ├── collection.upsert(ids, texts, metadatas)
        ├── build_bm25_index(docs)
        └── write new hash to .ingest_hash

Gemini Model Fallback Chain

gemini-2.5-flash hits capacity limits (503 / 429) at peak times. The backend retries through a chain without surfacing errors to the user.

architecture

_stream_tokens(prompt)
  │
  ├── try gemini-2.5-flash
  │     ├── success ──► stream tokens, record model name, return
  │     └── 503 / 429 ──► log warning, try next
  │
  ├── try gemini-2.0-flash
  │     ├── success ──► stream tokens, log fallback, return
  │     └── 503 / 429 ──► log warning, try next
  │
  ├── try gemini-2.0-flash-lite
  │     └── ...
  │
  └── try gemini-flash-latest
        ├── success ──► return
        └── failure ──► raise last exception

The frontend shows which model answered via a green pill badge. Chain is fully configurable via GEMINI_MODEL and GEMINI_FALLBACK_MODELS env vars — no code change needed to swap models.

Analytics Architecture

All engagement data lives in a single SQLite file. IPs are SHA-256 hashed before storage — never stored raw. Period-based queries use SQLite's datetime('now', '-N days') function.

architecture

analytics.db
│
├── interactions               ← chat analytics
│   ├── ip_hash  TEXT          SHA-256 of visitor IP
│   └── created_at  TIMESTAMP
│   Indexed: idx_ip ON ip_hash
│   powers: unique_visitors, total_responses
│   period filter: WHERE created_at >= datetime('now', '-7 days')
│
├── blog_views                 ← unique views per post per IP
│   ├── slug  TEXT
│   ├── ip_hash  TEXT
│   └── created_at  TIMESTAMP
│   UNIQUE(slug, ip_hash)      idempotent — INSERT OR IGNORE
│   Indexed: idx_view_slug ON slug
│
└── blog_claps                 ← cumulative claps per post per IP
    ├── slug  TEXT
    ├── ip_hash  TEXT
    ├── count  INTEGER          capped at 50 per user per post
    └── updated_at  TIMESTAMP
    UNIQUE(slug, ip_hash)
    ON CONFLICT: count = count + excluded.count
    Indexed: idx_clap_slug ON slug

GET /stats/overview returns all metrics for all periods in a single DB round-trip per period — called once on drawer open, not on every page load.

Why track engagement at all

Blog views and claps are not vanity metrics here. The view count tells you whether a post was read. The clap count tells you whether it resonated. The combination is useful signal for deciding what to write next — a post with high views and zero claps was probably clicked but abandoned; a post with low views and high claps per reader was deeply engaging to a small audience.

The period breakdown (7d / 30d / 1y / all-time) in the stats dashboard answers different questions. The 7-day window shows whether a recent share drove traffic. The all-time total shows which posts have lasting relevance. The analytics cost nothing to run, are privacy-preserving by design (SHA-256 hashed IPs, never stored raw), and give enough signal to be actionable.

Clap batching on the frontend

Rapid clap button clicks are batched client-side with a 1.5s debounce before a single API call is sent. The backend caps the cumulative total at 50 per user per post.

architecture

click ──► increment local count
click ──► increment local count   debounce timer resets
click ──► increment local count   debounce timer resets
(1.5s silence)
  │
  └──► POST /blog/{slug}/clap  body: count=3
         └──► backend: min(3, 50 - current_total) ──► upsert

Deploy Pipeline

architecture

developer: git push origin main
         │
         ▼
GitHub Actions (.github/workflows/deploy.yml)
         │
         ├── actions/checkout@v4
         ├── actions/setup-node@v4 (Node 20)
         ├── npm install (frontend/)
         ├── npm run build
         │     └── prebuild: node ../scripts/sync-knowledge.mjs
         │           ├── reads frontend/src/content/blog/*.mdx
         │           ├── parses frontmatter + strips MDX to plain text
         │           ├── writes backend/data/knowledge/blog.json
         │           └── copies backend/data/knowledge/*.json
         │                 ──► frontend/src/data/knowledge/
         │
         │     └── next build ──► static export ──► frontend/out/
         │
         ├── git add backend/data/knowledge/blog.json
         │         frontend/src/data/knowledge/
         │   git diff --staged --quiet ||
         │     git commit -m "chore: sync knowledge base [skip ci]"
         │     git push
         │   [skip ci] prevents infinite workflow loop
         │
         ├── upload-pages-artifact (path: frontend/out)
         └── deploy-pages ──► GitHub Pages live
         │
         ▼
Railway (auto-detects new commit on main)
         │
         ├── builds Docker image (python:3.11-slim)
         ├── pip install -e backend[dev]
         └── uvicorn app.main:app --app-dir backend/src
               │
               └── FastAPI lifespan startup:
                     ├── analytics.init_db()   CREATE TABLE IF NOT EXISTS
                     ├── blog_stats.init_db()  CREATE TABLE IF NOT EXISTS
                     ├── run_ingest()          hash check ──► skip or re-ingest
                     └── warmup()
                           ├── pre-load all-MiniLM-L6-v2
                           └── pre-load cross-encoder (downloads ~25MB if needed)

Single source of truth

architecture

EDIT                    SYNC                     CONSUMED BY
──────────────────────────────────────────────────────────────
backend/data/           sync-knowledge.mjs       frontend UI
knowledge/*.json  ──────────────────────►        (typed imports)
                                          ──►    RAG pipeline
                                                 (ChromaDB docs)

frontend/src/           sync-knowledge.mjs
content/blog/    ───────────────────────►  backend/data/
*.mdx               (generates)            knowledge/blog.json
                                          ──►    RAG pipeline
                                                 (blog documents)

The TypeScript data files (frontend/src/data/*.ts) are thin typed re-exports from the synced JSON copies. They never hardcode values — if profile.json changes, the UI picks it up after the next sync.

Frontend Architecture

Routing

architecture

/                    Avocado chatbot — full-screen, no nav/footer
/chat                Same chatbot, accessible from portfolio nav
/portfolio           Hero + domain chips + featured projects + skills
                     + testimonials carousel + contact
/experience          Work history timeline
/education           Education cards
/projects            Projects grid — source link pill tag buttons
/blog                Index sorted by publishedAt (immutable sort key)
/blog/[slug]         Post in Source Serif 4 font + engagement
/lab                 Build log index  (this page)
/lab/[slug]          System design entry

All portfolio routes share a layout via the (portfolio) route group — adds no URL segment. The chatbot lives outside this group — no nav, no footer, full screen.

Static export + basePath

The frontend is output: "export" — fully static HTML/CSS/JS deployed to GitHub Pages. In production, basePath: "/jayaremala" is set in next.config.ts. Next.js Link prepends this automatically. Plain anchor tags break in production — always use Link for internal navigation.

Blog engagement components

architecture

BlogPostList (client)
  ├── fetches /blog/stats/summary on mount
  └── renders post cards with per-post views + claps

BlogIndexStats (client)
  └── shows total claps + views in blog header

BlogEngagement (client, on each post page)
  ├── POST /blog/{slug}/view on mount (idempotent)
  ├── clap button: float-up +1 animation, burst scale
  ├── 1.5s debounce ──► POST /blog/{slug}/clap
  └── shows "You clapped Nx" running total

BlogGuideDrawer (client, floating button above mobile FAB)
  ├── MDX syntax reference for writing posts
  ├── live stats dashboard ──► fetches /stats/overview
  │     ├── summary table: 7d / 30d / 1y / all-time
  │     └── per-post breakdown sorted by views
  └── project maintenance appendix

Chat streaming

architecture

ChatInterface (client)
  │
  ├── POST /ai/chat/stream
  │     ReadableStream reads SSE events:
  │     token: "..."                 streamed text chunks
  │     done: true,
  │       model: "gemini-2.5-flash"  which model answered
  │       sources: [...]
  │
  ├── activeModel state
  │     shown as green pill badge after first response
  │     updates if fallback model was used
  │
  ├── stats state
  │     fetches /stats on mount
  │     shows "N responses · N visitors" in footer
  │
  └── ChatMessage
        full block + inline markdown renderer
        headings · bullets · numbered lists
        bold · italic · inline code · links · dividers

Tech Stack

undefined

Key Decisions

2026-04-22Lab page — living system design docs over static write-ups

The standard portfolio write-up is a polished post-hoc rationalisation. You ship the thing, then write the explanation that makes it sound planned. The real decisions — the dead-ends, the alternatives you considered, the constraints that forced your hand — disappear.

A living MDX page that gets amended as the system evolves preserves the actual reasoning. The Decision and Update timeline components make it natural to add entries in-place instead of rewriting history. The constraint that forced MDX over a database-backed CMS is meaningful: the whole frontend is a static export. There is no database write path. MDX files committed to the repo are the only durable storage available at build time.

What this means for the system: the lab section is the closest thing this project has to an architecture decision record. Future work on Avocado or the RAG pipeline should be documented here — not in a separate document that will never be found.

2026-04-15Hash-based re-ingest: SHA-256 of all knowledge JSON on startup

Two simpler alternatives exist and both are wrong. Always re-ingest: adds 15–20s to every Railway deploy regardless of whether the knowledge base changed. On a free-plan Railway service, the startup time is already visible to a recruiter opening the chatbot cold — an unnecessary 20s tax is unacceptable. Skip if ChromaDB non-empty: knowledge updates never propagate. You update experience.json with a new role, push, Railway redeploys — but the chatbot keeps answering with stale data because the collection count is above zero.

SHA-256 of the entire data/knowledge/ directory solves both failure modes with a single file read at startup cost. The hash file lives at chroma_db/.ingest_hash on the same persistent volume as ChromaDB — if the volume is wiped, the missing hash file triggers a fresh ingest, which is correct.

What this means for the system: portfolio updates — new job, new project, published blog post — automatically propagate to the chatbot's knowledge on the next Railway deploy without any manual step.

2026-04-10BM25 hybrid retrieval alongside ChromaDB dense search

When the dense retrieval results were first audited, a pattern emerged: specific identifiers scored poorly. "Qualcomm" wasn't in the vocabulary the embedding model learned to weight. "115 GB/day" gets embedded near other volume metrics but not near the exact text. "3000 RPS" retrieves documents about high-throughput systems in general, not the specific project that hit 3000 RPS.

These failures matter for a portfolio specifically because the most important queries people ask are about specifics. "Did he work at Shell?" is more common than "Tell me about his distributed systems experience." BM25 handles exact term recall with no model overhead — it's a pure in-memory frequency calculation rebuilt from the document list on every startup (~5ms).

What this means for the system: the hybrid pipeline changes what kinds of questions Avocado can answer reliably. Questions with proper nouns, numbers, company names, and specific project names now return the right chunk first. This is not a marginal improvement — it's the difference between "I don't have that information" and the correct answer.

2026-04-08Cross-encoder rerank only on top 20 candidates, not all documents

The bi-encoder architecture that powers ChromaDB has a structural limitation: query and passage are encoded independently, then compared by cosine similarity. The model never sees query and passage together. This means it can only retrieve documents that share embedding-space proximity with the query — it cannot reason about whether a passage actually answers the specific question asked.

The cross-encoder runs full transformer attention across the concatenated query-passage pair. It can read "does this text about Redis caching specifically answer the question about what caching strategies Jaya used?" — a richer relevance signal. The cost is O(n) per query. Running it on all 75+ documents would add 150–200ms per request. Running it on 20 post-RRF candidates adds about 30ms.

What this means for the system: the top-5 chunks fed to Gemini are the most relevant 5, not just the most similar 5. This matters most for questions where several plausible documents exist but only one actually answers the question. The reranker picks it.

2026-04-05SQLite over managed Postgres for analytics

The analytics workload is narrow: INSERT a view, INSERT OR UPDATE clap count, SELECT COUNT with a WHERE on timestamp. No joins. No concurrent writers (Railway is a single container). No schema migrations after initial creation. A managed Postgres would cost ~$5/month, add 1–3ms of network latency to every write, and bring a connection pool that is entirely unnecessary for a service handling at most a few hundred requests per day.

SQLite on a Railway persistent volume is zero-cost, same-process (no network), and the entire database is one file. The file can be downloaded for inspection at any time. INSERT OR IGNORE and ON CONFLICT DO UPDATE cover all idempotency requirements without transactions.

What this means for the system: analytics are fast, cost nothing to operate, and survive Railway deploys as long as the volume is mounted. The one operational risk is forgetting to set ANALYTICS_DB_PATH to the mounted volume path — if left at the default relative path, data is lost on every deploy. This is documented in the env vars section.

2026-03-20Static export to GitHub Pages instead of Vercel or SSR

The portfolio UI has no per-request server-side needs. Everything dynamic — chat responses, blog stats, clap counts — is client-side JavaScript calling the Railway API. Server-side rendering would add complexity and cost with no benefit here.

A fully static export (output: "export" in Next.js) generates HTML, CSS, and JS at build time. GitHub Pages serves it from a CDN for free with no cold starts. The one constraint this introduces is that Link from next/link must be used for all internal navigation because Next.js automatically prepends the basePath (/jayaremala in production). Plain anchor tags break production navigation silently — this has caused bugs and is worth knowing.

What this means for the system: the frontend is fast, free, and has no operational surface area. The only moving part is the Railway backend. All scaling concerns, uptime concerns, and cost concerns live there.

2026-03-10Single source of truth: backend JSON as canonical data

Before this refactor, portfolio data lived in two places: TypeScript files in frontend/src/data/ hardcoded the UI data, and JSON files in backend/data/knowledge/ powered the RAG knowledge base. They were maintained independently.

The drift was invisible. A new project was added to projects.ts for the UI but not to projects.json for the chatbot. Avocado answered "what are your projects?" with a list that was two projects behind the website. Anyone who looked at both and noticed would correctly conclude that the system was not being maintained.

The sync script makes the backend JSON canonical and generates everything else from it. The TypeScript data files are now thin typed wrappers over the synced JSON copies. The sync runs before every build and on every push via GitHub Actions — there is no path where the UI and chatbot knowledge are out of sync for more than one deploy.

What this means for the system: portfolio maintenance is a single edit to one JSON file. The UI, the chatbot, and the deploy pipeline all stay coherent automatically.

2026-02-28Gemini fallback chain instead of surfacing capacity errors

Gemini 2.5 Flash is the best available model for this use case — fast, capable, cheap. It also hits 503 and 429 capacity limits at peak times, particularly in the evenings. A portfolio chatbot that returns an error message on the first message is worse than no chatbot.

The fallback chain retries through gemini-2.0-flash, gemini-2.0-flash-lite, and gemini-flash-latest automatically. The frontend shows which model answered via a green pill badge — this keeps the fallback transparent without it reading as a failure. The visitor sees "gemini-2.0-flash" instead of "Service Unavailable".

What this means for the system: Avocado is available almost all of the time even when the primary model is under load. The quality difference between models is visible in the badge but not in the UX. The chain is fully configurable via env vars — no code change needed to reorder the chain or swap in a new model.

2026-02-1512 hard-coded FAQ documents as the highest-ROI knowledge addition

After the first week of real queries, the pattern was clear: the questions that matter most in the first 30 seconds are not well-served by retrieval from raw experience data. "What makes Jaya stand out?" returns experience bullet points — factually correct, but not a direct answer. "How do I hire him?" has no matching document in the knowledge base at all.

The FAQ documents are not retrieved from structured data — they are handwritten answers to the 12 highest-probability questions, loaded into ChromaDB as their own document type. They contain direct claims with exact numbers: "won the Qualcomm Edge AI Hackathon", "SnapLog runs at 15ms on an NPU", "gradeVITian has 17,000+ active users". When one is retrieved, Gemini gets the answer already formed — it just needs to surface it.

What this means for the system: the perceived quality of Avocado on high-value queries is determined more by the quality of these 12 documents than by any pipeline improvement. Keeping them up to date as achievements accumulate — new award, new scale milestone, new contact info — is the highest-leverage maintenance task in the entire knowledge base.

Progress Log

2026-04-22

Built /lab section. MDX-based living system design docs with custom components: Status, Arch, Decision, Update, Stack, Metric. Added Lab to nav. First entry: itsjaya itself.

2026-04-22

Blog guide drawer now shows live stats dashboard — unique visitors, Avocado responses, blog views, claps with 7d / 30d / 1y / all-time breakdown. New GET /stats/overview endpoint returns all periods in one API call.

2026-04-18

Blog engagement fully live: views (unique per IP per post), claps (max 50/user, 1.5s debounced batching), per-post stats on index cards, totals in blog header. Backed by SQLite on Railway volume.

2026-04-15

Analytics DB path made configurable via ANALYTICS_DB_PATH env var so it resolves to the Railway persistent volume regardless of working directory changes between deploys.

2026-04-10

BM25 hybrid retrieval added. GET /stats/overview with period filtering (7d / 30d / 1y / all-time) added to both analytics and blog stats modules.

2026-04-08

Cross-encoder reranker added. Pre-warmed at startup so the first user request does not pay the model-load penalty. Graceful fallback to RRF order if warmup has not completed.

2026-04-05

Chat markdown rendering rewritten — handles headings, bullets, numbered lists, bold, italic, inline code, links, dividers. Previously raw stars appeared in responses.

2026-04-03

Gemini model fallback chain implemented. Model indicator badge added to chatbot.

2026-03-25

Blog deployed with MDX, Source Serif 4 reading font, publishedAt-based sort. Sync script auto-generates blog.json so Avocado can answer questions about published posts.

2026-03-10

Single source of truth refactor complete. Backend JSON is canonical — TypeScript files are typed re-exports. sync-knowledge.mjs runs before every build.

2025-11-01

Project started. Basic FastAPI + ChromaDB + Next.js skeleton. First working Avocado response.

What's next

Railway persistent volume for ChromaDB + analytics (currently re-ingests on every free-plan redeploy)
Reading time estimate on blog cards
Search within blog posts
Avocado voice input (Web Speech API — already prototyped)
A/B test shorter vs longer system prompt to measure response quality

Back to Lab

Reference

Site Guide & Maintenance

New post frontmatter

publishedAt = immutable publish date used for sort order. date = display date (update freely). Filename becomes the URL slug: my-post.mdx → /blog/my-post

---
title: Your Title
date: "2026-04-21"
publishedAt: "2026-04-21"
description: One-line summary shown on index.
tags: [tag1, tag2]
---

Headings

## Section      ← large, border below
### Sub-section ← medium, no border
#### Label      ← uppercase small caps

Text formatting

**bold**        *italic*
`inline code`   ~~strikethrough~~

> blockquote pull quote

<Divider />   ← decorative section break

Links & lists

[link text](https://example.com)
[internal](/blog/my-post)

- bullet item
- another item
  - nested item

1. numbered item
2. second item

Images

Put image files in frontend/public/blog/

<!-- basic -->
![alt text](/blog/file.jpg)

<!-- with caption -->
![alt text](/blog/file.jpg "Caption text")

<!-- component -->
<BlogImage
  src="/blog/file.jpg"
  alt="description"
  caption="optional caption"
/>

Callout boxes

All MDX components (Callout, BlogImage, Divider) are auto-imported — no import statement needed.

<Callout type="info" title="Title">text</Callout>
<Callout type="tip" title="Title">text</Callout>
<Callout type="warning" title="Title">text</Callout>
<Callout type="quote" title="Title">text</Callout>

Code blocks

```python
def hello(): return 'hi'
```

Supported: python typescript javascript
bash json yaml sql go rust

Table

| Col A | Col B |
|---|---|
| val   | val   |

Appendix

Project Maintenance

Everything you need to keep the site up to date — data, blog posts, and deployments.

Live Stats

Fetching stats…

1 · Portfolio Data

Edit only in backend/data/knowledge/ — these are the single source of truth for both the website UI and the Avocado chatbot knowledge base. After any edit, run npm run sync from frontend/ (or just restart npm run dev). Never edit frontend/src/data/knowledge/ directly — those files are auto-overwritten.

Name, bio, tagline, location, contact

backend/data/knowledge/profile.json

name, tagline, bio, summary, obsession, previous, prev_domain, interested_domain, location, email, phone, github, linkedin, resume

Work experience — roles, companies, bullet points

backend/data/knowledge/experience.json

role, company, location, start, end, description, bullets[]

Education — degrees, institutions, highlights

backend/data/knowledge/education.json

institution, school, degree, field, location, start, end, gpa, highlights[]

Projects — title, description, tags, links

backend/data/knowledge/projects.json

title, description, tags[], featured, award, sourceLinks[{label,url}], note

Skills & tools — categories and items

backend/data/knowledge/skills.json

category, items[]

Testimonials — name, role, company, quote

backend/data/knowledge/testimonials.json

name, designation, company, linkedin, description, givenAt, source

⚠ Resume link is hardcoded in Nav.tsx

The resume Google Drive URL in profile.json powers the chatbot and home page, but components/Nav.tsx has a separate hardcoded copy in both desktop and mobile nav. Update both when the resume changes.

2 · Blog Posts

Create a new .mdx file — the filename becomes the URL slug. No sync needed; GitHub Actions auto-generates blog.json on push so the chatbot indexes the new post automatically.

New post file

frontend/src/content/blog/my-post.mdx

Filename → URL slug. Required frontmatter: title, date, publishedAt, description, tags[]

Post images

frontend/public/blog/

Place image files here. Reference as /blog/filename.jpg in MDX.

Auto-generated chatbot index

backend/data/knowledge/blog.json

Do not edit — auto-generated by scripts/sync-knowledge.mjs. GH Actions commits it on push; Railway re-ingests on deploy.

publishedAt vs date

publishedAt is the sort key — set it once and never change it. date is the display date — update freely (e.g. after a major revision).

2b · Lab — Living System Docs

Lab is for living, in-progress project documentation — architecture, decisions, and progress logs updated as the project evolves. Files live at frontend/src/content/lab/[slug].mdx. Filename = URL slug.

Frontmatter (required)

---
title: "My Project"
status: "active"        # active | paused | shipped
description: "One-line summary shown on lab index card."
startedAt: "2026-01-01"
updatedAt: "2026-04-22"  # ← update this every time you edit
tech: [Next.js, FastAPI, PostgreSQL]
---

status: active

Green badge with pulse animation. Sorted to top of lab index. Use while actively building.

status: paused

Amber badge. Sorted second. Use when work is on hold.

status: shipped

Indigo badge. Sorted last. Use when the project is complete and deployed.

Always update updatedAt

The lab index card shows "last updated [date]". Set it to today's date every time you make changes or the card will show a stale date.

Lab MDX components

Inline status badge — same colors as the index card. Put it near the top of the document so status is visible in the post.

Renders a row of monospace tech tags. Use for a full tech stack listing inside the document body (separate from the frontmatter tech[] chips in the header).

Highlighted stat box. Use for key numbers — latency, users, accuracy, uptime. Group multiple Metrics in a flex row for a dashboard effect.

Timeline entry with indigo dot. Use for architectural decisions, technology choices, or design tradeoffs. Children text is the reasoning.

Lighter timeline entry with zinc dot. Use for progress notes, milestone completions, or status changes over time. Add a new Update entry each time you revisit the project.

Architecture diagrams

```arch
┌─────────────┐     ┌─────────────┐
│  Frontend   │────▶│   Backend   │
└─────────────┘     └─────────────┘
```

Always use fenced ```arch blocks for diagrams — never a JSX component. Characters like <, >, and {} inside JSX children cause an MDX acorn parse error.

Typical update workflow

Create a new lab entry

Add frontend/src/content/lab/my-project.mdx with required frontmatter → commit + push → deploys automatically.

Update an existing entry

Edit the MDX file, update updatedAt in frontmatter → commit + push. No sync script needed — lab files are read directly at build time.

Mark a project shipped

Change status to "shipped" in frontmatter, update updatedAt, add a final <Update> timeline note → commit + push.

Chatbot indexing

Lab entries are indexed into ChromaDB via lab.json (auto-generated by sync-knowledge.mjs on every push). Avocado can answer questions about active lab projects, tech stack, and decisions.

3 · Deploy Pipeline (auto)

Everything is automated — just commit and push.

Update portfolio data

Edit any backend/data/knowledge/*.json → commit + push → GH Actions runs sync-knowledge.mjs (copies all JSON to frontend/src/data/knowledge/) → Railway redeploys → chatbot re-indexes (hash changed).

Publish a new blog post

Write MDX → commit + push → GH Actions runs sync-knowledge.mjs → generates blog.json + copies all JSON → auto-commits with [skip ci] → Railway redeploys → chatbot indexes the new post.

What sync-knowledge.mjs does

1) Reads all *.mdx from frontend/src/content/blog/, strips MDX, writes blog.json. 2) Copies ALL backend/data/knowledge/*.json → frontend/src/data/knowledge/. Run: node scripts/sync-knowledge.mjs from repo root.

GH Actions auto-commit

Workflow (deploy.yml) needs contents: write, pages: write, id-token: write permissions. Auto-commits synced files with [skip ci] tag to prevent infinite loops.

Chatbot re-ingest (hash-based)

Backend computes SHA-256 of all knowledge JSON files at startup. Re-ingests only when the hash changes — fast startup if nothing changed. Hash stored at chroma_db/.ingest_hash.

Static site deployment

Frontend builds as a static export and deploys to GitHub Pages (sabarishreddy99.github.io). Backend deploys to Railway. Both trigger on push to main.

4 · Blog Engagement

Tracked automatically. No config needed for new posts — engagement starts recording as soon as a reader opens the post.

Views — unique per visitor per post

Auto-recorded when a reader opens a post. One view per IP address. Shown on the post page and blog index.

Claps — up to 50 per visitor per post

Reader clicks the clap icon button. Clicks batch with a 1.5s debounce before saving. Total shown on index card and post page.

Storage — SQLite analytics.db

Stored in chroma_db/analytics.db. IPs are SHA-256 hashed — never stored raw. On Railway: set ANALYTICS_DB_PATH=/data/analytics.db with a persistent volume so counts survive redeploys.

Persistence on Railway

Without a volume, counts reset on every deploy. Add a Volume (Pro plan) mounted at /data and set ANALYTICS_DB_PATH=/data/analytics.db in backend environment variables.

API endpoints

POST /blog/{slug}/view · POST /blog/{slug}/clap (body: {count}) · GET /blog/{slug}/stats · GET /blog/stats/summary

5 · Avocado Chatbot

Response count & unique visitors

Tracked on every chat response. Shown in chatbot footer. Stored in the same analytics.db — subject to same Railway persistence note above.

Model indicator badge

Green pill shows which Gemini model answered (e.g. gemini-2.5-flash). Updates automatically if a fallback was used.

Swap the AI model

Change GEMINI_MODEL in Railway environment variables. No code change needed.

Model fallback chain

Primary: GEMINI_MODEL. Fallbacks: GEMINI_FALLBACK_MODELS (comma-separated). Auto-retries on 503/429 capacity errors in order.

Knowledge base — ChromaDB

ChromaDB persists to backend/chroma_db/ (git-ignored). On Railway: mount a persistent volume at /data and symlink or set the chroma path. Without a volume, the DB rebuilds on every deploy (works, just slower startup ~30–60s).

RAG pipeline

Hybrid: ChromaDB dense (all-MiniLM-L6-v2 embeddings) + BM25 lexical → RRF merge → cross-encoder rerank (ms-marco-MiniLM-L-6-v2). Retrieves top 5 chunks → fed as context to Gemini.

Startup warmup

Embedding model and cross-encoder load at startup. First response may be ~1–2s slower. Models download once and cache in Railway's ephemeral storage.

6 · Environment Variables

Backend vars → Railway → your backend service → Variables. Frontend vars → GitHub Actions secrets (used at build time).

Backend (Railway)

GOOGLE_API_KEY

Required. Google AI API key for Gemini. Chat endpoints return 503 without this.

GEMINI_MODEL

Primary model. Default: gemini-2.5-flash. Change here to swap models without code changes.

GEMINI_FALLBACK_MODELS

Comma-separated fallbacks tried on 503/429. Default: gemini-2.0-flash,gemini-2.0-flash-lite,gemini-flash-latest

ANALYTICS_DB_PATH

SQLite file path. Set to /data/analytics.db with a Railway persistent volume, otherwise counts reset on every deploy.

FRONTEND_ORIGIN

CORS allowed origins (comma-separated). Must include production frontend URL or browser requests will be blocked. Default includes localhost:3000 and GitHub Pages.

APP_ENV

dev or prod. Default: dev. Controls logging and debug behavior.

Frontend (GitHub Actions secrets / .env.local)

NEXT_PUBLIC_API_BASE_URL

Backend URL the browser calls. Set to your Railway backend URL in production (e.g. https://your-backend.up.railway.app). Required — chat and blog stats break without it.

NEXT_PUBLIC_BLOG_FONT

Blog reading font. Default: Source_Serif_4. Must match the font statically imported in frontend/src/app/layout.tsx.

Quick card

Image![alt](/blog/f.jpg)

Caption![alt](/blog/f.jpg "cap")

Link[text](https://url)

Bold**text**

Italic*text*

Code`code`

Strike~~text~~

Quote> text

Bullet- item

Numbered1. item

Divider<Divider />

Info box<Callout type="info">

Tip box<Callout type="tip">

Warn box<Callout type="warning">

Quote box<Callout type="quote">

Slicing open the avocado…

🥑✦

Lab

Activestarted 2025-11-01·last updated 2026-04-23

itsjaya — AI Portfolio

AI-powered personal portfolio with a RAG chatbot (Avocado), MDX blog, engagement analytics, and a fully automated deploy pipeline. Every layer is production-grade — not a toy.

Next.js 16FastAPIChromaDBGemini 2.5 FlashBM25Cross-encoderSQLiteRailwayGitHub PagesGitHub Actions

Overview

Most portfolios are static pages someone scrolls past in 30 seconds. This one starts a conversation.

Why build it this way

Active

System Architecture

architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                            CLIENT BROWSER                                │
│                                                                          │
│  ┌─────────────────────────────┐   ┌──────────────────────────────────┐  │
│  │       Avocado Chatbot       │   │      Portfolio + Blog + Lab      │  │
│  │   /  (full-screen)          │   │  /portfolio  /experience         │  │
│  │   /chat  (nav-accessible)   │   │  /education  /projects           │  │
│  │                             │   │  /blog  /blog/[slug]             │  │
│  │  ChatInterface              │   │  /lab   /lab/[slug]              │  │
│  │  ChatMessage (md renderer)  │   │                                  │  │
│  │  Model badge  ·  Stats      │   │  BlogPostList  BlogEngagement    │  │
│  │  SSE ReadableStream         │   │  BlogIndexStats  BlogGuideDrawer │  │
│  └──────────────┬──────────────┘   └──────────────────┬───────────────┘  │
└─────────────────┼────────────────────────────────────────┼───────────────┘
                  │  HTTPS + SSE                           │  HTTPS REST
                  ▼                                        ▼
┌──────────────────────────────────────────────────────────────────────────┐
│                     FastAPI Backend  (Railway)                           │
│                                                                          │
│  POST /ai/chat/stream  ──►  RAG pipeline  ──►  Gemini SSE               │
│  POST /ai/chat         ──►  RAG pipeline  ──►  Gemini sync              │
│  POST /blog/{slug}/view  ──►  unique view per IP                        │
│  POST /blog/{slug}/clap  ──►  cumulative claps (max 50 / user / post)   │
│  GET  /blog/{slug}/stats                                                 │
│  GET  /blog/stats/summary                                                │
│  GET  /stats              total_responses · unique_visitors              │
│  GET  /stats/overview     7d / 30d / 1y / all-time for all metrics      │
│  GET  /health                                                            │
│                                                                          │
│  ┌────────────────────────────┐   ┌──────────────────────────────────┐  │
│  │        RAG Store           │   │  SQLite  analytics.db            │  │
│  │                            │   │  (Railway persistent volume)     │  │
│  │  ChromaDB  PersistentClient│   │                                  │  │
│  │  HNSW cosine similarity    │   │  interactions                    │  │
│  │  all-MiniLM-L6-v2 embed    │   │  ├── ip_hash  TEXT              │  │
│  │  LRU cache (256 entries)   │   │  └── created_at  TIMESTAMP      │  │
│  │                            │   │                                  │  │
│  │  BM25Okapi  (rank_bm25)    │   │  blog_views                     │  │
│  │  in-memory, rebuilt every  │   │  ├── slug  TEXT                 │  │
│  │  startup from docs         │   │  ├── ip_hash  TEXT              │  │
│  │                            │   │  └── created_at  TIMESTAMP      │  │
│  │  CrossEncoder              │   │  UNIQUE(slug, ip_hash)          │  │
│  │  ms-marco-MiniLM-L-6-v2    │   │                                  │  │
│  │  pre-warmed at startup     │   │  blog_claps                     │  │
│  │                            │   │  ├── slug  TEXT                 │  │
│  │  RRF merge  k=60           │   │  ├── ip_hash  TEXT              │  │
│  └────────────────────────────┘   │  ├── count  INTEGER             │  │
│                                   │  └── updated_at  TIMESTAMP      │  │
│  ┌──────────────────────────────────────────────────────────────┐   │  │
│  │  Knowledge Base  backend/data/knowledge/                     │   │  │
│  │  profile.json · experience.json · education.json             │   │  │
│  │  projects.json · skills.json · testimonials.json             │   │  │
│  │  blog.json  (auto-generated from MDX on every push)          │   │  │
│  └──────────────────────────────────────────────────────────────┘   │  │
└──────────────────────────────────────────────────────────────────────────┘
                  │
                  │  Google AI API  (HTTPS)
                  ▼
┌─────────────────────────────────────┐
│  Gemini 2.5 Flash  (primary)        │
│  Gemini 2.0 Flash  (fallback 1)     │
│  Gemini 2.0 Flash Lite  (fallback 2)│
│  Gemini Flash Latest  (fallback 3)  │
│  auto-retry on 503 / 429            │
└─────────────────────────────────────┘

RAG Pipeline — Deep Dive

The retrieval pipeline runs before every Gemini call. Four stages in sequence:

architecture

User message: "what's your strongest AI project?"
        │
        ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 1 — Query Expansion                               │
│                                                           │
│  Goal: generate multiple angles so narrow phrasing       │
│  does not miss relevant chunks.                          │
│                                                           │
│  Query 1 (verbatim):                                      │
│    "what's your strongest AI project?"                    │
│                                                           │
│  Query 2 (name-anchored):                                 │
│    "what's your strongest AI project? Jaya Sabarish       │
│     Reddy Remala"                                         │
│                                                           │
│  Query 3 (topic keyword — detected: project/built):       │
│    "projects built SnapLog CodeCollab Multi-Agent         │
│     GeneCart"                                             │
│                                                           │
│  Query 4 (conversation context — last user turn):         │
│    injected only if prior message exists and differs      │
│                                                           │
│  Result: up to 4 query strings                            │
└──────────────────────┬────────────────────────────────────┘
                       │ 4 queries
          ┌────────────┴─────────────┐
          ▼                          ▼
┌──────────────────────┐   ┌──────────────────────────────┐
│  STAGE 2a — DENSE    │   │  STAGE 2b — LEXICAL (BM25)   │
│                      │   │                              │
│  One batched encode  │   │  BM25Okapi scoring           │
│  call for all 4      │   │  Tokenize: lowercase,        │
│  queries —           │   │  strip punctuation,          │
│  single forward pass │   │  keep len > 1 tokens         │
│  ~160ms vs ~400ms    │   │                              │
│  for serial calls    │   │  Catches exact matches:      │
│                      │   │  "3000 RPS", "SnapLog",      │
│  HNSW cosine search  │   │  "Qualcomm", "78%",          │
│  in ChromaDB         │   │  "115 GB/day"                │
│                      │   │                              │
│  top 6 per query     │   │  top 15 results              │
│  = up to 24 chunks   │   │                              │
│  (deduped by id)     │   │  In-memory, rebuilt on       │
└──────────┬───────────┘   │  every startup from docs     │
           └───────┬────────┘
                   ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 3 — Reciprocal Rank Fusion                        │
│                                                           │
│  score(doc) = sum of  1 / (k + rank_i)    k = 60        │
│  Cormack, Clarke, Buettcher 2009                         │
│                                                           │
│  Chunks in both dense + BM25 results get boosted.        │
│  Dense metadata takes precedence on conflicts.           │
│                                                           │
│  Output: up to 20 candidates ranked by RRF score         │
└──────────────────────┬────────────────────────────────────┘
                       ▼
┌───────────────────────────────────────────────────────────┐
│  STAGE 4 — Cross-Encoder Rerank                          │
│                                                           │
│  Model: cross-encoder/ms-marco-MiniLM-L-6-v2             │
│                                                           │
│  Scores (query, passage) pairs jointly — query and       │
│  passage attend to each other. Far better relevance      │
│  than bi-encoder cosine, but O(n) so only runs on        │
│  the top 20 post-RRF candidates (~30ms).                 │
│                                                           │
│  Falls back to RRF order instantly if not warmed up.     │
│                                                           │
│  Output: top 5 chunks injected into Gemini context       │
└───────────────────────────────────────────────────────────┘

Why this retrieval stack

Knowledge Base Design

Atomic chunking: each bullet point, project, and skill category gets its own document. Large blobs hurt precision because one chunk about Shell PLC also contains irrelevant AWS details.

Total: ~70–80 documents. ChromaDB handles this trivially; the retrieval quality comes from chunking strategy and the hybrid pipeline, not scale.

Why atomic chunking

FAQ documents and what they fix

Hash-based re-ingest and what it protects

On startup, run_ingest() computes SHA-256 of all JSON files and compares against a stored hash file. The hash protects against two failure modes that would otherwise operate silently.

The SHA-256 fingerprint of the entire data/knowledge/ directory solves both: fast startup when nothing changed, automatic re-ingest when anything changed.

architecture

startup
  │
  ├── compute SHA256 of all *.json in data/knowledge/
  ├── read stored hash from chroma_db/.ingest_hash
  │
  ├── hash unchanged AND ChromaDB not empty
  │     └── skip ingest, rebuild BM25 in-memory, done
  │
  └── hash changed OR DB empty
        ├── delete ChromaDB collection
        ├── build_documents() — ~75 (id, text, type) tuples
        ├── collection.upsert(ids, texts, metadatas)
        ├── build_bm25_index(docs)
        └── write new hash to .ingest_hash

Gemini Model Fallback Chain

gemini-2.5-flash hits capacity limits (503 / 429) at peak times. The backend retries through a chain without surfacing errors to the user.

architecture

_stream_tokens(prompt)
  │
  ├── try gemini-2.5-flash
  │     ├── success ──► stream tokens, record model name, return
  │     └── 503 / 429 ──► log warning, try next
  │
  ├── try gemini-2.0-flash
  │     ├── success ──► stream tokens, log fallback, return
  │     └── 503 / 429 ──► log warning, try next
  │
  ├── try gemini-2.0-flash-lite
  │     └── ...
  │
  └── try gemini-flash-latest
        ├── success ──► return
        └── failure ──► raise last exception

The frontend shows which model answered via a green pill badge. Chain is fully configurable via GEMINI_MODEL and GEMINI_FALLBACK_MODELS env vars — no code change needed to swap models.

Analytics Architecture

All engagement data lives in a single SQLite file. IPs are SHA-256 hashed before storage — never stored raw. Period-based queries use SQLite's datetime('now', '-N days') function.

architecture

analytics.db
│
├── interactions               ← chat analytics
│   ├── ip_hash  TEXT          SHA-256 of visitor IP
│   └── created_at  TIMESTAMP
│   Indexed: idx_ip ON ip_hash
│   powers: unique_visitors, total_responses
│   period filter: WHERE created_at >= datetime('now', '-7 days')
│
├── blog_views                 ← unique views per post per IP
│   ├── slug  TEXT
│   ├── ip_hash  TEXT
│   └── created_at  TIMESTAMP
│   UNIQUE(slug, ip_hash)      idempotent — INSERT OR IGNORE
│   Indexed: idx_view_slug ON slug
│
└── blog_claps                 ← cumulative claps per post per IP
    ├── slug  TEXT
    ├── ip_hash  TEXT
    ├── count  INTEGER          capped at 50 per user per post
    └── updated_at  TIMESTAMP
    UNIQUE(slug, ip_hash)
    ON CONFLICT: count = count + excluded.count
    Indexed: idx_clap_slug ON slug

GET /stats/overview returns all metrics for all periods in a single DB round-trip per period — called once on drawer open, not on every page load.

Why track engagement at all

Clap batching on the frontend

Rapid clap button clicks are batched client-side with a 1.5s debounce before a single API call is sent. The backend caps the cumulative total at 50 per user per post.

architecture

click ──► increment local count
click ──► increment local count   debounce timer resets
click ──► increment local count   debounce timer resets
(1.5s silence)
  │
  └──► POST /blog/{slug}/clap  body: count=3
         └──► backend: min(3, 50 - current_total) ──► upsert

Deploy Pipeline

architecture

developer: git push origin main
         │
         ▼
GitHub Actions (.github/workflows/deploy.yml)
         │
         ├── actions/checkout@v4
         ├── actions/setup-node@v4 (Node 20)
         ├── npm install (frontend/)
         ├── npm run build
         │     └── prebuild: node ../scripts/sync-knowledge.mjs
         │           ├── reads frontend/src/content/blog/*.mdx
         │           ├── parses frontmatter + strips MDX to plain text
         │           ├── writes backend/data/knowledge/blog.json
         │           └── copies backend/data/knowledge/*.json
         │                 ──► frontend/src/data/knowledge/
         │
         │     └── next build ──► static export ──► frontend/out/
         │
         ├── git add backend/data/knowledge/blog.json
         │         frontend/src/data/knowledge/
         │   git diff --staged --quiet ||
         │     git commit -m "chore: sync knowledge base [skip ci]"
         │     git push
         │   [skip ci] prevents infinite workflow loop
         │
         ├── upload-pages-artifact (path: frontend/out)
         └── deploy-pages ──► GitHub Pages live
         │
         ▼
Railway (auto-detects new commit on main)
         │
         ├── builds Docker image (python:3.11-slim)
         ├── pip install -e backend[dev]
         └── uvicorn app.main:app --app-dir backend/src
               │
               └── FastAPI lifespan startup:
                     ├── analytics.init_db()   CREATE TABLE IF NOT EXISTS
                     ├── blog_stats.init_db()  CREATE TABLE IF NOT EXISTS
                     ├── run_ingest()          hash check ──► skip or re-ingest
                     └── warmup()
                           ├── pre-load all-MiniLM-L6-v2
                           └── pre-load cross-encoder (downloads ~25MB if needed)

Single source of truth

architecture

EDIT                    SYNC                     CONSUMED BY
──────────────────────────────────────────────────────────────
backend/data/           sync-knowledge.mjs       frontend UI
knowledge/*.json  ──────────────────────►        (typed imports)
                                          ──►    RAG pipeline
                                                 (ChromaDB docs)

frontend/src/           sync-knowledge.mjs
content/blog/    ───────────────────────►  backend/data/
*.mdx               (generates)            knowledge/blog.json
                                          ──►    RAG pipeline
                                                 (blog documents)

Frontend Architecture

Routing

architecture

/                    Avocado chatbot — full-screen, no nav/footer
/chat                Same chatbot, accessible from portfolio nav
/portfolio           Hero + domain chips + featured projects + skills
                     + testimonials carousel + contact
/experience          Work history timeline
/education           Education cards
/projects            Projects grid — source link pill tag buttons
/blog                Index sorted by publishedAt (immutable sort key)
/blog/[slug]         Post in Source Serif 4 font + engagement
/lab                 Build log index  (this page)
/lab/[slug]          System design entry

All portfolio routes share a layout via the (portfolio) route group — adds no URL segment. The chatbot lives outside this group — no nav, no footer, full screen.

Static export + basePath

Blog engagement components

architecture

BlogPostList (client)
  ├── fetches /blog/stats/summary on mount
  └── renders post cards with per-post views + claps

BlogIndexStats (client)
  └── shows total claps + views in blog header

BlogEngagement (client, on each post page)
  ├── POST /blog/{slug}/view on mount (idempotent)
  ├── clap button: float-up +1 animation, burst scale
  ├── 1.5s debounce ──► POST /blog/{slug}/clap
  └── shows "You clapped Nx" running total

BlogGuideDrawer (client, floating button above mobile FAB)
  ├── MDX syntax reference for writing posts
  ├── live stats dashboard ──► fetches /stats/overview
  │     ├── summary table: 7d / 30d / 1y / all-time
  │     └── per-post breakdown sorted by views
  └── project maintenance appendix

Chat streaming

architecture

ChatInterface (client)
  │
  ├── POST /ai/chat/stream
  │     ReadableStream reads SSE events:
  │     token: "..."                 streamed text chunks
  │     done: true,
  │       model: "gemini-2.5-flash"  which model answered
  │       sources: [...]
  │
  ├── activeModel state
  │     shown as green pill badge after first response
  │     updates if fallback model was used
  │
  ├── stats state
  │     fetches /stats on mount
  │     shows "N responses · N visitors" in footer
  │
  └── ChatMessage
        full block + inline markdown renderer
        headings · bullets · numbered lists
        bold · italic · inline code · links · dividers

Tech Stack

undefined

Key Decisions

2026-04-22Lab page — living system design docs over static write-ups

2026-04-15Hash-based re-ingest: SHA-256 of all knowledge JSON on startup

2026-04-10BM25 hybrid retrieval alongside ChromaDB dense search

2026-04-08Cross-encoder rerank only on top 20 candidates, not all documents

2026-04-05SQLite over managed Postgres for analytics

2026-03-20Static export to GitHub Pages instead of Vercel or SSR

2026-03-10Single source of truth: backend JSON as canonical data

What this means for the system: portfolio maintenance is a single edit to one JSON file. The UI, the chatbot, and the deploy pipeline all stay coherent automatically.

2026-02-28Gemini fallback chain instead of surfacing capacity errors

2026-02-1512 hard-coded FAQ documents as the highest-ROI knowledge addition

Progress Log

2026-04-22

Built /lab section. MDX-based living system design docs with custom components: Status, Arch, Decision, Update, Stack, Metric. Added Lab to nav. First entry: itsjaya itself.

2026-04-22

2026-04-18

Blog engagement fully live: views (unique per IP per post), claps (max 50/user, 1.5s debounced batching), per-post stats on index cards, totals in blog header. Backed by SQLite on Railway volume.

2026-04-15

Analytics DB path made configurable via ANALYTICS_DB_PATH env var so it resolves to the Railway persistent volume regardless of working directory changes between deploys.

2026-04-10

BM25 hybrid retrieval added. GET /stats/overview with period filtering (7d / 30d / 1y / all-time) added to both analytics and blog stats modules.

2026-04-08

Cross-encoder reranker added. Pre-warmed at startup so the first user request does not pay the model-load penalty. Graceful fallback to RRF order if warmup has not completed.

2026-04-05

Chat markdown rendering rewritten — handles headings, bullets, numbered lists, bold, italic, inline code, links, dividers. Previously raw stars appeared in responses.

2026-04-03

Gemini model fallback chain implemented. Model indicator badge added to chatbot.

2026-03-25

Blog deployed with MDX, Source Serif 4 reading font, publishedAt-based sort. Sync script auto-generates blog.json so Avocado can answer questions about published posts.

2026-03-10

Single source of truth refactor complete. Backend JSON is canonical — TypeScript files are typed re-exports. sync-knowledge.mjs runs before every build.

2025-11-01

Project started. Basic FastAPI + ChromaDB + Next.js skeleton. First working Avocado response.

What's next

Railway persistent volume for ChromaDB + analytics (currently re-ingests on every free-plan redeploy)
Reading time estimate on blog cards
Search within blog posts
Avocado voice input (Web Speech API — already prototyped)
A/B test shorter vs longer system prompt to measure response quality

Back to Lab

Reference

Site Guide & Maintenance

New post frontmatter

publishedAt = immutable publish date used for sort order. date = display date (update freely). Filename becomes the URL slug: my-post.mdx → /blog/my-post

---
title: Your Title
date: "2026-04-21"
publishedAt: "2026-04-21"
description: One-line summary shown on index.
tags: [tag1, tag2]
---

Headings

## Section      ← large, border below
### Sub-section ← medium, no border
#### Label      ← uppercase small caps

Text formatting

**bold**        *italic*
`inline code`   ~~strikethrough~~

> blockquote pull quote

<Divider />   ← decorative section break

Links & lists

[link text](https://example.com)
[internal](/blog/my-post)

- bullet item
- another item
  - nested item

1. numbered item
2. second item

Images

Put image files in frontend/public/blog/

<!-- basic -->
![alt text](/blog/file.jpg)

<!-- with caption -->
![alt text](/blog/file.jpg "Caption text")

<!-- component -->
<BlogImage
  src="/blog/file.jpg"
  alt="description"
  caption="optional caption"
/>

Callout boxes

All MDX components (Callout, BlogImage, Divider) are auto-imported — no import statement needed.

<Callout type="info" title="Title">text</Callout>
<Callout type="tip" title="Title">text</Callout>
<Callout type="warning" title="Title">text</Callout>
<Callout type="quote" title="Title">text</Callout>

Code blocks

```python
def hello(): return 'hi'
```

Supported: python typescript javascript
bash json yaml sql go rust

Table

| Col A | Col B |
|---|---|
| val   | val   |

Appendix

Project Maintenance

Everything you need to keep the site up to date — data, blog posts, and deployments.

Live Stats

Fetching stats…

1 · Portfolio Data

Name, bio, tagline, location, contact

backend/data/knowledge/profile.json

name, tagline, bio, summary, obsession, previous, prev_domain, interested_domain, location, email, phone, github, linkedin, resume

Work experience — roles, companies, bullet points

backend/data/knowledge/experience.json

role, company, location, start, end, description, bullets[]

Education — degrees, institutions, highlights

backend/data/knowledge/education.json

institution, school, degree, field, location, start, end, gpa, highlights[]

Projects — title, description, tags, links

backend/data/knowledge/projects.json

title, description, tags[], featured, award, sourceLinks[{label,url}], note

Skills & tools — categories and items

backend/data/knowledge/skills.json

category, items[]

Testimonials — name, role, company, quote

backend/data/knowledge/testimonials.json

name, designation, company, linkedin, description, givenAt, source

⚠ Resume link is hardcoded in Nav.tsx

The resume Google Drive URL in profile.json powers the chatbot and home page, but components/Nav.tsx has a separate hardcoded copy in both desktop and mobile nav. Update both when the resume changes.

2 · Blog Posts

Create a new .mdx file — the filename becomes the URL slug. No sync needed; GitHub Actions auto-generates blog.json on push so the chatbot indexes the new post automatically.

New post file

frontend/src/content/blog/my-post.mdx

Filename → URL slug. Required frontmatter: title, date, publishedAt, description, tags[]

Post images

frontend/public/blog/

Place image files here. Reference as /blog/filename.jpg in MDX.

Auto-generated chatbot index

backend/data/knowledge/blog.json

Do not edit — auto-generated by scripts/sync-knowledge.mjs. GH Actions commits it on push; Railway re-ingests on deploy.

publishedAt vs date

publishedAt is the sort key — set it once and never change it. date is the display date — update freely (e.g. after a major revision).

2b · Lab — Living System Docs

Frontmatter (required)

---
title: "My Project"
status: "active"        # active | paused | shipped
description: "One-line summary shown on lab index card."
startedAt: "2026-01-01"
updatedAt: "2026-04-22"  # ← update this every time you edit
tech: [Next.js, FastAPI, PostgreSQL]
---

status: active

Green badge with pulse animation. Sorted to top of lab index. Use while actively building.

status: paused

Amber badge. Sorted second. Use when work is on hold.

status: shipped

Indigo badge. Sorted last. Use when the project is complete and deployed.

Always update updatedAt

The lab index card shows "last updated [date]". Set it to today's date every time you make changes or the card will show a stale date.

Lab MDX components

Inline status badge — same colors as the index card. Put it near the top of the document so status is visible in the post.

Renders a row of monospace tech tags. Use for a full tech stack listing inside the document body (separate from the frontmatter tech[] chips in the header).

Highlighted stat box. Use for key numbers — latency, users, accuracy, uptime. Group multiple Metrics in a flex row for a dashboard effect.

Timeline entry with indigo dot. Use for architectural decisions, technology choices, or design tradeoffs. Children text is the reasoning.

Lighter timeline entry with zinc dot. Use for progress notes, milestone completions, or status changes over time. Add a new Update entry each time you revisit the project.

Architecture diagrams

```arch
┌─────────────┐     ┌─────────────┐
│  Frontend   │────▶│   Backend   │
└─────────────┘     └─────────────┘
```

Always use fenced ```arch blocks for diagrams — never a JSX component. Characters like <, >, and {} inside JSX children cause an MDX acorn parse error.

Typical update workflow

Create a new lab entry

Add frontend/src/content/lab/my-project.mdx with required frontmatter → commit + push → deploys automatically.

Update an existing entry

Edit the MDX file, update updatedAt in frontmatter → commit + push. No sync script needed — lab files are read directly at build time.

Mark a project shipped

Change status to "shipped" in frontmatter, update updatedAt, add a final <Update> timeline note → commit + push.

Chatbot indexing

Lab entries are indexed into ChromaDB via lab.json (auto-generated by sync-knowledge.mjs on every push). Avocado can answer questions about active lab projects, tech stack, and decisions.

3 · Deploy Pipeline (auto)

Everything is automated — just commit and push.

Update portfolio data

Publish a new blog post

Write MDX → commit + push → GH Actions runs sync-knowledge.mjs → generates blog.json + copies all JSON → auto-commits with [skip ci] → Railway redeploys → chatbot indexes the new post.

What sync-knowledge.mjs does

GH Actions auto-commit

Workflow (deploy.yml) needs contents: write, pages: write, id-token: write permissions. Auto-commits synced files with [skip ci] tag to prevent infinite loops.

Chatbot re-ingest (hash-based)

Backend computes SHA-256 of all knowledge JSON files at startup. Re-ingests only when the hash changes — fast startup if nothing changed. Hash stored at chroma_db/.ingest_hash.

Static site deployment

Frontend builds as a static export and deploys to GitHub Pages (sabarishreddy99.github.io). Backend deploys to Railway. Both trigger on push to main.

4 · Blog Engagement

Tracked automatically. No config needed for new posts — engagement starts recording as soon as a reader opens the post.

Views — unique per visitor per post

Auto-recorded when a reader opens a post. One view per IP address. Shown on the post page and blog index.

Claps — up to 50 per visitor per post

Reader clicks the clap icon button. Clicks batch with a 1.5s debounce before saving. Total shown on index card and post page.

Storage — SQLite analytics.db

Stored in chroma_db/analytics.db. IPs are SHA-256 hashed — never stored raw. On Railway: set ANALYTICS_DB_PATH=/data/analytics.db with a persistent volume so counts survive redeploys.

Persistence on Railway

Without a volume, counts reset on every deploy. Add a Volume (Pro plan) mounted at /data and set ANALYTICS_DB_PATH=/data/analytics.db in backend environment variables.

API endpoints

POST /blog/{slug}/view · POST /blog/{slug}/clap (body: {count}) · GET /blog/{slug}/stats · GET /blog/stats/summary

5 · Avocado Chatbot

Response count & unique visitors

Tracked on every chat response. Shown in chatbot footer. Stored in the same analytics.db — subject to same Railway persistence note above.

Model indicator badge

Green pill shows which Gemini model answered (e.g. gemini-2.5-flash). Updates automatically if a fallback was used.

Swap the AI model

Change GEMINI_MODEL in Railway environment variables. No code change needed.

Model fallback chain

Primary: GEMINI_MODEL. Fallbacks: GEMINI_FALLBACK_MODELS (comma-separated). Auto-retries on 503/429 capacity errors in order.

Knowledge base — ChromaDB

RAG pipeline

Hybrid: ChromaDB dense (all-MiniLM-L6-v2 embeddings) + BM25 lexical → RRF merge → cross-encoder rerank (ms-marco-MiniLM-L-6-v2). Retrieves top 5 chunks → fed as context to Gemini.

Startup warmup

Embedding model and cross-encoder load at startup. First response may be ~1–2s slower. Models download once and cache in Railway's ephemeral storage.

6 · Environment Variables

Backend vars → Railway → your backend service → Variables. Frontend vars → GitHub Actions secrets (used at build time).

Backend (Railway)

GOOGLE_API_KEY

Required. Google AI API key for Gemini. Chat endpoints return 503 without this.

GEMINI_MODEL

Primary model. Default: gemini-2.5-flash. Change here to swap models without code changes.

GEMINI_FALLBACK_MODELS

Comma-separated fallbacks tried on 503/429. Default: gemini-2.0-flash,gemini-2.0-flash-lite,gemini-flash-latest

ANALYTICS_DB_PATH

SQLite file path. Set to /data/analytics.db with a Railway persistent volume, otherwise counts reset on every deploy.

FRONTEND_ORIGIN

CORS allowed origins (comma-separated). Must include production frontend URL or browser requests will be blocked. Default includes localhost:3000 and GitHub Pages.

APP_ENV

dev or prod. Default: dev. Controls logging and debug behavior.

Frontend (GitHub Actions secrets / .env.local)

NEXT_PUBLIC_API_BASE_URL

Backend URL the browser calls. Set to your Railway backend URL in production (e.g. https://your-backend.up.railway.app). Required — chat and blog stats break without it.

NEXT_PUBLIC_BLOG_FONT

Blog reading font. Default: Source_Serif_4. Must match the font statically imported in frontend/src/app/layout.tsx.

Quick card

Image![alt](/blog/f.jpg)

Caption![alt](/blog/f.jpg "cap")

Link[text](https://url)

Bold**text**

Italic*text*

Code`code`

Strike~~text~~

Quote> text

Bullet- item

Numbered1. item

Divider<Divider />

Info box<Callout type="info">

Tip box<Callout type="tip">

Warn box<Callout type="warning">

Quote box<Callout type="quote">