Vision Doc — Agent J v1.0 · April 2026

Multi-agent architecture vision, design principles & architectural assessment

Contents
  1. Current Status — Live Tracker
  2. Agent J — Vision Overview
  3. History Agent — Scope, Use Cases, System Prompt, Dynasty Coverage & Tagging Convention
  4. Architecture Assessment — Q&A
  5. Design Principles & Decisions
  6. Key Decisions Made
  7. Phase 5 — Automated Knowledge Validation Loop
  8. Deployment Roadmap
Live Tracker Current Status
Last updated: April 2026  ·  Current phase: Phase 2 — Personal Agent + Gmail  ·  Active sprint: Sprint 2 — Personal Agent + Gmail OAuth

Phase Progress

PhaseStatusSummary
Phase 1 — Local build Complete ✓ Agent J skeleton + History Agent tool wiring + Claude Code MCP interface + basic web UI. Verified end-to-end.
Phase 2 — 70% corpus + Personal Agent In Progress Personal Agent locally. Gmail OAuth integration. Full Pallava corpus. VectorStoreFactory abstraction.
Phase 3 — AWS migration + mobile web Not started EC2 t3.medium + EBS. Mobile web app. Docker deploy. LibreOffice PDF conversion.
Phase 4 — Public launch Not started History Agent public. Security Agent. Auth layer. CDN.
Phase 5 — Validation loop Not started Automated Query + Review Agent loop. 50-question test set. Self-improving corpus.

Sprint 1 — Agent J Skeleton ✓ Complete

TaskStatusNotes
Monorepo structure — agent-j/ project scaffold Done ✓ Alongside pallava-translator/ and personal-agent/. All paths via env vars.
FastAPI backend — POST /agent/run agentic loop Done ✓ Claude tool-use routing. 5-step max loop. Runs on port 8001.
History Agent tool definitions (3 tools) Done ✓ rag_query, semantic_search, research_paper
Agent J memory — SQLite Done ✓ sessions, messages, routing_log, preferences tables
MCP tool definition for Claude Code Done ✓ agent_j_mcp.py + .claude/settings.json
Basic web UI — /agent-ui Done ✓ Chat UI with markdown rendering + tool call chips. Verified working.
Docker-ready layout Done ✓ Dockerfile per agent. docker-compose.yml in monorepo root.

Sprint 2 — Personal Agent + Gmail (Current)

TaskStatusNotes
personal-agent/ project scaffold Done ✓ Port 8002. Same Docker-ready, env-var structure as agent-j/.
Gmail OAuth flow Done ✓ Google Cloud project created. OAuth credentials configured. /auth/gmail/start → callback → token saved.
Gmail tool implementations Done ✓ gmail_list_emails, gmail_read_email, gmail_send_email
Personal corpus — SQLite Done ✓ facts, email_cache, tasks, preferences tables in memory/personal.db
Personal Agent tool definitions in Agent J Done ✓ 6 tools: gmail (3) + tasks (2) + remember_fact (1). 9 total tools in Agent J.
Agent J routing updated Done ✓ History tools → history_agent_client.py. Personal tools → personal_agent_client.py.
Gmail OAuth authorisation Pending Start personal-agent → visit http://localhost:8002/auth/gmail/start → grant access
VectorStoreFactory abstraction Pending Deferred — required before EC2 migration (Phase 3)

Decisions Log — April 2026

DateDecisionRationale
Apr 2026 Monorepo with 3 separate projects (agent-j, personal-agent, pallava-translator) Clean security boundary. History Agent stays public-facing. Personal Agent corpus fully isolated.
Apr 2026 Mobile web app instead of native iPhone app Cost saving. Responsive web UI on AWS covers the use case. Native app deferred indefinitely.
Apr 2026 Claude Code MCP as sole interface for Phase 1 No infrastructure needed. Natural language → Agent J → sub-agents from Claude Code desktop.
Apr 2026 Gmail OAuth deferred to Sprint 2 Not needed for Agent J skeleton. Reduces Sprint 1 scope.
Apr 2026 Apple Calendar integration skipped for now Requires macOS or iCloud API. Revisit when Personal Agent is in active sprint.
Apr 2026 AWS account with dedicated email (or +aws alias) Security hygiene. AWS root email should be separate from personal email. Enable MFA immediately.
Apr 2026 Notes page in Pallava project is History Agent specific Personal Agent will have its own separate notes/corpus. No migration needed.

Pallava Corpus Status (History Agent)

MetricValue
Sources ingested42 (38 PDFs, 2 images, 1 video, 1 URL)
Knowledge chunks~4,000+
Anthropic monthly spend limitRaised to $200 (April 2026)
Pending ingests~6 PDFs failed due to spend limit — retry after limit raised
Phase 2 gate70% corpus — not yet reached
Section 1 Agent J — Vision Overview

Agent J is a personal multi-agent AI system with a single natural language entry point. The Master Agent accepts any instruction and routes it to the appropriate sub-agent using Claude's native tool-use (function-calling) as the routing mechanism — no custom keyword matching. Sub-agents are thin wrappers around existing or new capabilities; they can be added incrementally without redesigning the core.

Master
Agent J
Single entry point. Accepts natural language instructions. Routes to sub-agents via tool-use. Coordinates end-to-end delivery (PDF, Word, PowerPoint → Gmail, WhatsApp).
Build first
Sub-Agent 1
History Agent
Indian history research — Pallava, Chalukya, and all dynasties. RAG queries, research paper generation, semantic search. Wraps existing KB infrastructure.
Phase 1 MVP
Sub-Agent 2
Personal Agent
Personal life management — Hindu calendar reminders (Ekadasi, Amavasya, festivals), Gmail, RSU vesting, travel + temple corpus, weekly digest, AI learning roadmap, personal corpus. Private always — never exposed externally. Requires SQLite state store.
Phase 2 — Local
Sub-Agent 3
Security Agent
Ongoing architecture audit. Independently reviews all system configurations and setups for vulnerabilities. Stateless audit loop.
Phase 4
Routing mechanism: Claude's native tool-use IS the routing logic. No if/elif, no regex, no keyword matching. Tool descriptions tell Claude what each sub-agent does — Claude picks the correct tool automatically. The agentic loop runs up to 5 steps; Claude can chain tools sequentially within a single instruction.

Interfaces

InterfaceHowStatus
Claude Code (MCP tool)MCP tool definition calls POST /agent/run. Natural language in Claude Code → routes to Agent J backend.Phase 1 — primary interface
Web UI (/agent-ui)Simple chat-style text input → fetch → rendered markdown result. Matches existing design system.Phase 1 — build alongside backend
Mobile web appResponsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Add to Home Screen for app-like experience. Same backend — no App Store required.Phase 3 — post AWS migration
Native iPhone appSwiftUI app calling POST /agent/run. Requires $99/year Apple Developer account + significant build effort.Deferred indefinitely — mobile web app covers the use case at zero extra cost

Output Delivery

ChannelFormatStatus
Gmail — Send (SMTP)Plain text, Word, PDF attachments outboundPhase 1 — smtplib stdlib, no new packages
Gmail — Read (OAuth)Inbox reading, prioritisation, action surfacingSprint 2 — Personal Agent. Requires Google Cloud OAuth credentials.
Word (.docx)Research papers, reportsAlready built — export_documents.py
PDFAll documentsAlready built — docx2pdf (local). LibreOffice fallback on EC2.
PowerPoint (.pptx)PresentationsAlready built — routers/pptx.py
WhatsAppText messages via Twilio APIPhase 4+ — not confirmed. Under review.
ZoomSummaries, meeting prep via Zoom OAuthPhase 4+ — not confirmed. Under review.
Section 2 History Agent — Scope & Capabilities
CapabilityDetailFeasibility
Dynasty coverage Start with Pallava + Chalukya; expand to all major Indian empires and dynasties Now — just ingest PDFs. No code changes.
Corpus scale 1,000+ PDFs · 10,000+ images · 1,000+ web/research documents ChromaDB handles this scale easily
South Indian languages Telugu, Kannada, Tamil, Pallava Grantha, Grantha Already deployed — multilingual-e5-large
Other Indian languages Hindi, Sanskrit Same multilingual model covers both
Cross-dynasty relationships Accurate cross-dynasty details without hallucination Phase 3 — needs corpus completeness + source weighting
Architectural style recognition Identify dynasty/period/style from uploaded images Partial now — POST /identify-temple exists. Full upgrade in Phase 2.

Why You Need the History Agent — Use Cases

The History Agent is the bridge between the knowledge corpus and natural language delivery. Without it, the KB answers one question at a time through a fixed RAG pipeline. With it, Claude reasons across multiple queries, resolves contradictions, and delivers finished output.

UC1 Multi-step research question

"What architectural features distinguish early Pallava temples from imperial Pallava temples, and which scholars agree vs disagree on the transition period?"

Without agent: One RAG query, top 5 chunks, one answer. Likely incomplete — a single query cannot capture both periods and the scholarly debate simultaneously.

With agent: Claude queries "early Pallava architecture", then "imperial Pallava architecture", then "Pallava temple transition period", compares Minakshi vs Sastri vs Longhurst on the dating, surfaces the disagreement, and synthesises a complete answer — all in one instruction.

Prerequisite: Tag filter + source weighting (Phase 2) must be in place for the agent to isolate and compare sources meaningfully.

UC2 Inscription translation with missing characters

"Translate this Pallava Grantha inscription fragment: śrī-nara…"

Without agent: One-shot prompt with whatever context is retrieved. If a character is ambiguous, the translation is a guess with no transparency.

With agent: Claude attempts the translation, identifies the ambiguous character, queries the script training registry for that specific glyph, retrieves the confidence score and Unicode mapping, retries the translation with that grounding, and explicitly flags any remaining uncertainty.

Prerequisite: Script registry API exposed as an agent tool (GET /training/image/file/{filename} already exists — needs a tool definition wrapper).

UC3 Cross-dynasty comparison

"How did Pallava temple iconography influence early Chola temples at Thanjavur?"

Without agent: Searches one collection, returns Pallava chunks OR Chola chunks — rarely both in the right proportion for a meaningful comparison.

With agent: Queries Pallava sources first, then Chola sources, identifies overlapping iconographic elements mentioned in both, and constructs a comparative answer with citations from each tradition.

Prerequisite: Consistent tagging convention across all ingested sources. If Chola sources are not tagged chola, the agent cannot isolate them. Tag discipline during ingestion is critical.

UC4 Research paper generation

"Write a 2,000-word research note on Mahendravarman I's contribution to rock-cut architecture"

Without agent: Single RAG call, limited context, generic structure. Cannot accumulate evidence across multiple retrieval steps.

With agent: Queries the KB section by section (early reign → cave temples → inscriptions → scholarly dating debates), accumulates evidence across multiple retrieval steps, resolves conflicts between sources, then generates a structured paper with inline citations — and delivers to Word/PDF when done.

Prerequisite: A structured system prompt defining output format, section headings, citation style, and the ideological guardrail. This is the most critical single instruction for the agent.

UC5 Gap detection — knowing what you don't know

"What do we know about Pallava naval history?"

Without agent: Returns whatever 5 chunks match, even if corpus coverage is thin. No signal that the answer is incomplete or unreliable.

With agent: Queries the topic, counts the evidence, and explicitly responds: "Only 2 chunks found across 3 sources — corpus coverage on Pallava naval history is limited. The available evidence suggests… but this should be treated as preliminary until more sources are ingested."

Prerequisite: Confidence reporting instruction in the system prompt — Claude must be told to always state source count and flag low-coverage topics explicitly.

History Agent — System Prompt Specification

The system prompt is the single most important setup item for the History Agent. Every use case above depends on specific instructions being present in it. The following seven components are required.

#InstructionWhat it enablesRequired for
1 Identity & corpus declaration
"You are the History Agent for the Pallava Knowledge Base. You have access to a scholarly corpus of [N] sources covering Pallava, Chalukya, and South Indian history."
Sets scope — Claude knows what it has access to and what is out of scope All use cases
2 Tool definitions
Explicit descriptions of each tool: history_rag_query, history_semantic_search, history_research_paper, script_registry_lookup. Claude picks tools automatically based on descriptions.
Enables multi-step reasoning — Claude decides which tool to call next based on what it found UC1, UC2, UC3, UC4
3 Grounding rule
"Every factual claim must be traceable to a retrieved chunk. If you cannot ground a claim in the corpus, state explicitly that this is your assessment and not sourced."
Prevents hallucination — all answers are citation-backed All use cases, especially UC4
4 Ideological guardrail
"Responses must be strictly fact-based and sourced from the scholarly corpus only. Do not favour left-leaning ideologies, Christian missionary perspectives, or western-biased narratives on Indian Hindu history."
Ensures scholarly neutrality and source fidelity across all outputs All use cases, especially UC4
5 Confidence reporting
"Always state how many chunks were retrieved and from how many distinct sources. If fewer than 3 chunks found, explicitly flag the answer as low-confidence and recommend ingesting additional sources."
Enables gap detection — sparse corpus topics are flagged rather than answered confidently UC5, and as a quality signal on all use cases
6 Output format rules
Research notes: section headings, inline citations (Author, Year), word count target. Translations: original script → IAST → meaning → confidence score. Comparisons: side-by-side structure with explicit agreement/disagreement flags.
Consistent, stakeholder-ready output without additional formatting prompts UC2, UC3, UC4
7 Loop limit
"Maximum 5 retrieval steps per instruction. If sufficient evidence is not found in 5 queries, summarise what was found and state the coverage gap."
Prevents infinite loops and runaway API costs on broad or ambiguous questions UC1, UC3, UC4
Ideological Guardrail: The History Agent system prompt must explicitly state that responses are strictly fact-based and sourced only from the ingested scholarly corpus. The model must not favour left-leaning ideologies, Christian missionary perspectives, or western-biased narratives on Indian Hindu history. This guardrail will be added to: (1) Master Agent system prompt, (2) RAG answerer system prompt in rag/answerer.py, (3) research paper generation prompt.

History Agent — Dynasty Coverage

The History Agent is not limited to Pallava history. It queries the knowledge base using semantic search — dynasty names are not hardcoded anywhere in the agent logic. Any dynasty whose sources have been ingested and correctly tagged is automatically searchable, in any supported language. Two prerequisites must be met before multi-dynasty queries work reliably: corpus ingested per dynasty, and tags applied consistently at upload time using the convention below.

DynastyPeriodCorpus StatusAgent Readiness
Pallava275–897 CE42 sources ingested — actively buildingPartial — Phase 2 gate at 70%
Chalukya543–753 CENot startedBlocked on corpus
Chola300–1279 CENot startedBlocked on corpus
Rashtrakuta753–982 CENot startedBlocked on corpus
Vijayanagara1336–1646 CENot startedBlocked on corpus
Maurya / Gupta322 BCE–550 CENot startedBlocked on corpus
Hoysala1026–1343 CENot startedBlocked on corpus
Multilingual support is already active. The embedding model (intfloat/multilingual-e5-large) handles Tamil, Telugu, Kannada, Sanskrit, Hindi, and Pallava Grantha out of the box — no configuration needed per dynasty. Sources in regional languages are fully searchable today.

Tagging Convention v1.0

Tags are the agent's primary filtering mechanism. Without consistent tags, cross-dynasty comparisons (UC3) and topic-scoped queries fail silently — the agent finds results but cannot isolate the right subset. Every source must follow this convention at upload time. Tags cannot be automatically inferred after ingestion — retroactive patching via POST /admin/patch-tags is available but expensive at scale.

Every source should carry tags across 4 dimensions. Dimension 1 (Dynasty) and Dimension 2 (Source Type) are mandatory. Dimension 3 (Period) and Dimension 4 (Topic) are strongly recommended.

Dimension 1 — Dynasty Mandatory

One tag per primary dynasty. Add all that apply for multi-dynasty sources.

TagDynastyPeriod
pallavaPallava275–897 CE
cholaChola300–1279 CE
chalukyaChalukya543–753 CE
rashtrakutaRashtrakuta753–982 CE
vijayanagaraVijayanagara1336–1646 CE
mauryaMaurya322–185 BCE
guptaGupta320–550 CE
hoysalaHoysala1026–1343 CE
general-indianMulti-dynasty / pan-Indian scope

Dimension 2 — Source Type Mandatory

Drives source weighting in Phase 2. Every source must declare exactly one.

TagMeaning
primary-sourceInscriptions, coins, contemporary records created during the period
excavation-reportASI reports, field surveys, archaeological publications
scholarlyPeer-reviewed academic books and papers by recognised historians
referenceDictionaries, encyclopaedias, atlases, chronological tables
webWikipedia, blogs, online articles
videoYouTube lectures, documentaries

Dimension 3 — Period Recommended

Sub-period within a dynasty. Omit if uncertain — a missing tag is better than a wrong one.

TagCovers
early-pallava275–575 CE — Simhavishnu and before
imperial-pallava575–750 CE — Mahendravarman I to Narasimhavarman II
late-pallava750–897 CE — Nandivarman II onwards
early-chola300–850 CE
imperial-chola850–1200 CE
late-chola1200–1279 CE
Add equivalent period tags for other dynasties as their corpus is built

Dimension 4 — Topic Recommended

Subject matter. Multiple topic tags allowed and encouraged per source.

TagCovers
architectureTemple design, structural features, building techniques
inscriptionEpigraphic records, stone and copper plate inscriptions
sculptureIconography, relief panels, bronze casting
paintingCave paintings, murals, manuscript illustration
numismaticsCoins, seals, medallions
literaturePoetry, Puranas, Sangam literature
historyPolitical history, genealogy, chronology, warfare
religionShaivism, Vaishnavism, Jainism, Buddhism in context
geographyTrade routes, ports, territorial extent
languageScript, grammar, phonology, epigraphy methodology

Examples — how real sources are tagged

SourceTags (in order)
Minakshi — Administration and Social Life under the Pallavaspallava scholarly imperial-pallava history
ASI Report — Mahabalipuram excavationpallava excavation-report imperial-pallava architecture sculpture
Shore Temple copper plate inscriptionpallava primary-source imperial-pallava inscription
Longhurst — Pallava Architecturepallava scholarly architecture
YouTube — Kailasanathar Temple lecturepallava video imperial-pallava architecture
Wikipedia — Pallava dynastypallava web history
Sastri — A History of South Indiapallava chola chalukya scholarly history general-indian

Rules

  1. Dynasty is always first in the tag list — makes the Library visually scannable
  2. Source type is always second — drives Phase 2 source weighting
  3. Never use spaces — use hyphens (early-pallava not early pallava)
  4. Use existing tags before creating new ones — check all four dimensions above first
  5. Multi-dynasty sources get all applicable dynasty tags — don't pick just one
  6. When in doubt on period, omit it — a missing tag is better than a wrong one
  7. Apply tags at upload time — retroactive patching via POST /admin/patch-tags works but is tedious at scale
Gate for new dynasties: Before ingesting the first source from any new dynasty (Chola, Chalukya, etc.), verify the tagging convention covers that dynasty's period tags. Add them to this document first, then begin ingestion. Do not ingest first and tag later.

History Agent Tools (Phase 1)

Tool NameWhen Claude picks itWraps
history_rag_query"Who built Shore Temple?" · "What language did Pallavas use?"rag/answerer.py:answer()
history_research_paper"Generate a paper on Pallava art" · "Summarise Pallava warfare"rag/retriever.py:retrieve() + Claude synthesis
history_semantic_search"Show me passages about" · "Find sources on" · "Browse"rag/retriever.py:retrieve()
Section 3 Architecture Assessment — Q&A

Seven architectural questions assessed against the current codebase (April 2026). Full analysis of feasibility, gaps, and recommended approach for each.

Q1 Is the current design flexible for multi-agent architecture in the future?
✓ Yes — with one caveat

What works in your favour: The pluggable ingester pattern (BaseIngesterPdfIngester, DocxIngester, etc.) is clean and extensible. The RAG pipeline (rag/answerer.py + rag/retriever.py) consists of two separate, independently callable functions — a History Agent can wrap them without modification. Claude's native tool-use is the correct routing mechanism. The FastAPI router pattern (app/routers/) means adding an agent.py router is a natural extension.

The caveat — client coupling: app/dependencies.py returns a hardcoded anthropic.Anthropic() client. When the Master Agent needs to call sub-agents with different models (Haiku for fast routing, Opus for deep research), there is no way to configure this without touching multiple files. Fix: replace hardcoded model strings with os.environ.get("MODEL_SMART", "claude-opus-4-6") before wiring the agent loop.

The bigger risk is state: Background jobs (_jobs dict in admin.py) are in-memory. When a Master Agent starts a long-running research paper task, the result disappears on server restart. For the History Agent alone this is acceptable. Personal Agent (reminders, goals, Hindu calendar, Gmail across sessions) requires a persistent SQLite store from day one.

Summary: Flexible for History Agent now. Personal Agent needs SQLite state persistence first. Security Agent can run as a stateless audit loop.

Q2 Does it really work to use Claude Code or an Apple app as the main interface?
✓ Yes — both work, different timelines

Claude Code: Yes, immediately, no code changes needed. Claude Code's MCP tool interface can call any HTTP endpoint. You already have POST /agent/run planned. Write an MCP tool definition that calls this endpoint and Claude Code becomes a natural language front-end to your entire knowledge base today. This is the fastest path to a working agent interface.

Apple app (iPhone): Yes, but it is a separate project. Your FastAPI backend is REST-over-HTTP. Any iOS app can call it. The architecture is correct — app/main.py is already described as "the backend for a future iOS/Android app." What you need to build: a SwiftUI app with a chat-style UI, auth (a fixed API key in the app is sufficient for personal use), and the /agent/run endpoint on the backend. The iOS app does not change any backend design decisions. Build the backend first; the app is a client.

One practical issue with both: Long-running requests. Research paper generation takes 15–30 seconds. The fix is the same for both interfaces: convert long-running agent tasks to background jobs with polling (POST /agent/run → returns job_idGET /agent/job/{id} polls for result). This pattern already exists in admin.py for PDF ingestion — apply it to the agent loop in the same way.

Q3 Do you see any issues with the current design for EC2 migration in the future?
⚠ Three concrete issues — one blocking

Issue 1 — Local file paths (most significant): source_library.py copies files to C:\Users\siddi\OneDrive\Personal\History\Pallavas\source_library\. On EC2 there is no OneDrive. Since this is already controlled by the PALLAVA_LIBRARY_DIR env var, this is a configuration change not a code change — but you need to plan where files live on EC2 (e.g. an EBS volume at /data/source_library) before migrating.

Issue 2 — ChromaDB is local (manageable): ChromaDB's PersistentClient writes to a local directory. On EC2 this directory must be on a persistent EBS volume, not the ephemeral root volume. Standard deployment practice — no code change required.

Issue 3 — docx2pdf is Windows-only (blocking): On Windows, docx2pdf uses Microsoft Word via COM automation. On Linux EC2 there is no Word. Replace with LibreOffice (libreoffice --headless --convert-to pdf) or Gotenberg. This is a 10-line code change but it will break on first deploy to Linux EC2 if not addressed.

Issue 4 — py -3 command: On EC2 (Linux), the Python command is python3, not py -3. Any shell scripts or subprocess calls using py -3 will fail. Minor — worth noting for deployment scripts.

Issue 5 — Unicode fonts for Word export (minor): The .docx research paper export uses Noto Serif to render IAST diacritical characters (ā, ī, ū, ṭ, ḍ etc.). On Windows this font must be installed manually. On EC2 (Ubuntu/Debian), run: sudo apt-get install fonts-noto-serif as part of the server setup script — one line, resolves all Unicode rendering issues in exported Word documents.

No structural issues. FastAPI + uvicorn is production-grade and runs identically on EC2. Tighten CORS (allow_origins=["*"]) before going public.

Q4 Is ChromaDB sufficient if I scale to multiple dynasties with thousands of PDF files?
✓ Yes — with one design decision to make

The numbers: 1,000 PDFs × ~100 chunks per PDF = ~100,000 chunks. 10,000 images × 1 description = 10,000 vectors. Total: ~110,000 vectors at 1024 dimensions. ChromaDB's HNSW index handles 10M+ vectors — 110K is trivial. At 100K vectors, HNSW queries return in <100ms locally. On EC2 with an EBS volume expect 200–400ms. Fully acceptable.

The real scaling risk is not ChromaDB — it is knowledge.json: knowledge_store.py loads the full knowledge.json into memory on every ingest operation. At 22.7 MB today (~5,000 chunks) this is fine. At 100,000 chunks this file will be ~450 MB and loading it on every ingest will be slow. The fix: stop using knowledge.json as a query store and query ChromaDB directly for everything. knowledge.json becomes an append-only audit log, not a lookup table. No re-ingestion needed — this is a code change to the ingest path only.

Multi-dynasty collection design — recommended approach: Rather than one collection per dynasty (chalukya_knowledge, chola_knowledge), use a single indian_knowledge collection with a dynasty metadata tag. All dynasties in one collection, filter by tag at query time. This makes cross-dynasty queries simple and natural — exactly what is needed for "cross-dynasty relationship details without hallucination." Migrate to this model when Chalukya ingestion begins — it is a reindex operation, not a code change.

Q5 How efficient is Claude OCR compared to competitors? How effective is it in this project?
✓ Claude is the right choice for this use case
ScenarioClaude VisionAmazon TextractGoogle Document AI
Modern printed English/IAST PDFs Excellent Excellent Excellent
Pre-2000 Tamil PDFs (TSCII/legacy encoding) Fix #2 handles this in preprocessing Poor — no legacy encoding awareness Poor
Sanskrit/Tamil Unicode in printed books Good Limited Good (dedicated Indic models)
Stone inscription photographs (epigraphic) Excellent — contextual reasoning Poor — pixel-level only Fair
Grantha / Vatteluttu script Unique advantage — reasoning about rare scripts Cannot handle Cannot handle
Cost per page ~$0.003–0.015 (Haiku/Sonnet/Opus) ~$0.0015 ~$0.0015
Structured extraction (tables, forms) Good Excellent Excellent

Why Claude wins for this project specifically: The critical advantage is contextual understanding. When Claude sees a damaged inscription photograph with partial characters, it can reason: "This is a Grantha script Pallava copper plate, therefore this partially visible character is likely..." — Textract and Document AI do pixel-level pattern matching, not contextual reasoning. For scholarly epigraphic work this is not a small difference — it is the entire problem domain.

Where competitors are cheaper: For structured documents (invoices, forms, tables), Textract's AnalyzeDocument API is faster and cheaper. For high-volume modern printed text, Google Document AI is cheaper at scale. Neither of these scenarios applies to the primary use case.

Recommendation: Keep Claude for OCR. The OCR evaluation suggested in the requirements is worth doing for modern Indic printed text where Google's dedicated Indic models might reduce cost at scale. For inscription photographs and rare scripts, Claude is in a category of its own.

Q6 How difficult is it to switch to another vector database when I go live?
⚠ Moderately difficult — ~105 lines in 4 files

ChromaDB API calls appear in four files. The APIs between providers are not compatible — Pinecone, Weaviate, Qdrant, pgvector all have different client libraries and query syntax.

FileWhat uses ChromaDBLines
knowledge_store.pyget_knowledge_collection(), upsert(), delete(), count()~30
rag/retriever.pycollection.query() — the most critical path~20
app/routers/gallery.pyget_gallery_collection(), upsert(), query()~25
vector_store.pypallava_inscriptions collection management~30

Pragmatic approach (recommended): Before going live on EC2, create a vector_store_factory.py with a thin VectorStore wrapper class that encapsulates the 4 operations actually used (upsert, query, delete, count). All 4 files import from this factory. When migrating, change one file. This is 2–3 hours of refactoring that buys clean migration later.

Hosted alternatives for Phase 4: Qdrant Cloud — closest API to ChromaDB, easiest migration. pgvector (PostgreSQL extension) — if one database for everything (agent state + vectors) is desired, compelling for EC2. Pinecone — managed, no EC2 required, but vendor lock-in and higher cost at scale.

Q7 Do you see any bottlenecks with the current design when scaling to multi-agent architecture?
⚠ Four bottlenecks identified
BottleneckWhen it mattersFix
Synchronous blocking requests Now — agent tasks take 15–60 seconds. With one uvicorn worker, no other request is served during this time. Convert long-running agent tasks to background jobs with polling. Pattern already exists in admin.py.
knowledge.json full load on every ingest Phase 2 — at ~50K+ chunks, loading 200+ MB into memory on every ingest becomes slow. Remove knowledge.json as a runtime lookup; query ChromaDB directly. Keep .json as append-only audit log.
No session state for multi-turn conversations Personal Agent from day one. History Agent is stateless — acceptable. Personal Agent (Hindu calendar reminders, Gmail, goals) is not. SQLite sessions table — store messages[] per session_id, load on each /agent/run call.
ChromaDB single-writer constraint Phase 3 — multi-user beta. Concurrent writes risk index corruption. ChromaDB Server mode or migrate to hosted vector DB (Qdrant Cloud). Phase 3 concern only.

Priority: Fix the synchronous blocking issue before launching the agent loop — a 60-second HTTP request will time out on mobile and feels broken. Background jobs + polling solves this and is already a proven pattern in the codebase.

Section 4 Design Principles & Decisions
🔌
AI Provider Agnostic
Design must work with Claude, ChatGPT, Gemini, or any future provider without a full redesign.
Approach now: replace hardcoded "claude-opus-4-6" strings with MODEL_SMART / MODEL_FAST env vars. Full abstraction layer deferred to Phase 3.
🗄️
Vector DB Agnostic
No hard lock-in to ChromaDB. Must be portable to any vector database at EC2 migration time.
Approach: create vector_store_factory.py wrapping the 4 operations used. Single file to change at migration.
📈
Scalable
Handle 1,000+ PDFs, 10,000+ images, multiple dynasties. Grow from personal → select users → public.
ChromaDB is sufficient through Phase 3. Migrate to hosted DB at Phase 4 public launch.
🔮
Future-Proof
Architecture must accommodate AI advancements without requiring full redesigns. Flexibility is the key design constraint.
Pluggable ingesters, separate RAG functions, tool-use routing — all extensible without touching core.
🔒
Confidential
Project data must never be used to train external models.
Anthropic API does not train on API data by default. Confirm API tier. All data local or OneDrive.
⚖️
Fact-Based
History Agent must never favour left-leaning, missionary, or western-biased narratives on Indian history.
Explicit guardrail in Master Agent, RAG answerer, and research paper prompts. All answers grounded in corpus only.
Section 5 Key Decisions Made
Section 6 Deployment Roadmap

Agent Architecture

AgentProjectVisibilityCalls
Agent J (Master) C:/Personal/AI Project/agent-j/ Personal only History Agent + Personal Agent via HTTP
History Agent C:/Personal/AI Project/pallava-translator/ Public-facing (Phase 4) No access to Personal Agent or Agent J internals
Personal Agent C:/Personal/AI Project/personal-agent/ Private only — never exposed Can call History Agent via Master. Private corpus stays isolated.
Security Agent TBD (Phase 4) Internal only Stateless audit loop across all agents

Interfaces

InterfacePhaseHowCost
Claude Code (MCP) Phase 1 — Now MCP tool definition calls POST /agent/run on local machine. Natural language in Claude Code → Agent J routes to sub-agents. $0
Web UI (/agent-ui) Phase 1 — Now Simple chat-style text input → fetch → rendered markdown. Built alongside Agent J backend. $0
Mobile web app Phase 3 — Post AWS Responsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Can be added to Home Screen — looks and feels like a native app. Same backend, no App Store required. Included in EC2 cost (~$35/month)
Native iPhone app (SwiftUI) Phase 4+ — Optional Deferred indefinitely. Mobile web app covers the use case at zero additional cost. Native app only if specific iOS features (push notifications, offline mode) are needed. $99/year Apple Developer + build effort. Deferred.

Development Workflow (Local → AWS)

StepDetail
1. Develop locally Write and test on Antigravity at localhost:8000. All three agents run via docker compose up in agent-j/.
2. Push to GitHub git push origin main — private repo. Corpus data (PDFs, ChromaDB, SQLite) never goes to GitHub — stays on local disk and EBS volume.
3. Install system dependencies sudo apt-get install fonts-noto-serif libreoffice — Noto Serif for Unicode IAST characters in Word exports; LibreOffice as docx2pdf replacement for PDF conversion on Linux.
4. Copy corpus data to EBS Corpus files are excluded from git (too large). What to copy and what to skip:

Item Copy? Impact if missing Action
corpus/chroma_db/ Via Git LFS No RAG/search/research papers Set up Git LFS — then git clone handles it automatically
source_library/ PDFs Phase 3: scp Downloads fail; research paper images missing; can't re-ingest scp -r "C:/Users/siddi/OneDrive/.../source_library/" ec2-user@<IP>:/data/source_library/ then set PALLAVA_LIBRARY_DIR=/data/source_library in Parameter Store. ~15 min, zero code changes.
corpus/pdf_pages/ ❌ Skip Research paper images missing until regenerated Auto-regenerated on next ingest. Do not copy — 7.5GB of disposable cache.
gmail_token.json scp only Gmail won't work — must re-authorise OAuth Never in git (secret). scp personal-agent/auth/gmail_token.json ec2-user@<IP>:/data/auth/
Phase 4 — Move source_library to S3 (~half day, 3 files, ~50 lines):
  • source_library.py — upload to S3 instead of local disk (~20 lines)
  • app/routers/library.py — serve downloads as S3 presigned URLs (~10 lines)
  • app/routers/research.py + ingest/pdf_ingester.py — download from S3 to temp file before rendering (~20 lines)
No DB schema changes, no ChromaDB changes, no frontend changes.

⚠ Known risks with corpus copy:
  • File paths will break (most important)source_library.py stores absolute Windows paths (C:\Users\siddi\OneDrive\...). Run a path migration script after copying to rewrite all stored paths to the EBS mount (e.g. /data/source_library/).
  • ChromaDB copy risk — stop all agents before copying chroma_db/. Safer: skip copy, run POST /admin/reindex on EC2 (~5 min) to rebuild fresh from knowledge.json (already in git).
  • OneDrive sync conflict — pause OneDrive sync before running scp to avoid partial/locked files.
  • File permissions — run chmod -R 644 /data/ on EC2 after copy.
5. Deploy to EC2 SSH into EC2 → git clone https://github.com/Sid00009/Project_OM.git && docker compose up -d --build. Later: automate with GitHub Actions (optional).

AWS Setup (Recommended)

ResourceSpecCostNotes
EC2 t3.medium (2 vCPU, 4GB RAM) ~$30/month Upgrade to t3.large if embedding gets slow. 2 min, no downtime.
EBS Volume 50GB gp3 ~$5/month Stores corpus, ChromaDB, SQLite, uploaded files. Persists across EC2 restarts.
Route 53 (optional) Custom domain ~$0.50/month e.g. agent-j.yourdomain.com. Not required for personal use.
Total Phase 3 ~$35-40/month Covers all three agents + mobile web app + corpus storage.
AWS account: No connection to Claude/Anthropic account. Use a dedicated email or siddijagadeesh+aws@gmail.com. Enable MFA on root account immediately after creation.

Secrets & Credentials Management

The same os.environ.get() pattern is used in all three agents — credentials are never hardcoded. The source of those environment variables changes per environment, but the application code does not.

EnvironmentWhere secrets liveHow app reads themNotes
Local (now) .env file in each project root — loaded by python-dotenv at startup os.environ.get("KEY") — same in all app/config.py files .env is in .gitignore — never committed to GitHub
AWS EC2 (Phase 3) AWS Systems Manager Parameter Store (free tier) or Secrets Manager (~$0.40/secret/month) Startup script exports parameters as env vars before uvicorn starts. App code unchanged. No .env file on server — AWS injects values at boot

Secrets per Agent

AgentSecretLocalAWS
All agents ANTHROPIC_API_KEY .env file Parameter Store → env var
Personal Agent GMAIL_CLIENT_ID, GMAIL_CLIENT_SECRET personal-agent/.env Parameter Store → env var
Personal Agent Gmail refresh token (auth/gmail_token.json) Local file — in .gitignore EBS volume mounted at /app/auth/ — persists across restarts
Agent J SQLite memory DB (memory/agent_j.db) Local file EBS volume mounted at /app/memory/

Gmail OAuth Setup (one-time, per machine)

StepDetail
1. Google Cloud project Create project agent-j-personal at console.cloud.google.com
2. Enable Gmail API APIs & Services → Library → Gmail API → Enable
3. OAuth consent screen External · App name: Agent J · Support email: your Gmail · Add yourself as test user
4. Create OAuth credentials Credentials → Create OAuth 2.0 Client ID → Web application · Redirect URI: http://localhost:8002/auth/gmail/callback
5. Save credentials Copy Client ID + Client Secret into personal-agent/.env
6. Authorise Start Personal Agent → visit http://localhost:8002/auth/gmail/start → grant access → token saved to auth/gmail_token.json
7. On AWS (Phase 3) Complete OAuth once locally → copy gmail_token.json to EBS volume on EC2 → token auto-refreshes, no re-authorisation needed
Security rules: Never commit .env or gmail_token.json to GitHub — both are in .gitignore. If either is accidentally exposed, rotate immediately: Anthropic dashboard for API keys, Google Cloud Console → revoke token for Gmail. On AWS, the EC2 instance uses an IAM role to read Parameter Store — no credentials are stored in EC2 config files.
⚠ Gmail re-authorisation required after password change: If you change your Google account password, the Gmail refresh token is immediately invalidated. You must re-run the OAuth flow: start Personal Agent → visit http://localhost:8002/auth/gmail/start → grant access again. The same applies if you revoke access manually in Google Account → Security → Third-party access, or if the token has been unused for 6 months.
⚠ API key in start_all.bat — screen sharing risk: The ANTHROPIC_API_KEY is stored in plain text inside start_all.bat for convenience. This file is not committed to GitHub, but the key will be visible if you share your screen, share the file, or if someone accesses your laptop. Before any screen share or demo session, close the file in your editor and do not open it. If the key is ever exposed, rotate it immediately at console.anthropic.com → API Keys.

Phase Roadmap

PhaseScopeAgent J StatusInfrastructure
Phase 1
Now · Local
Agent J skeleton. History Agent wired as first tool. Claude Code MCP interface. Basic web UI. Docker-ready layout from day one. Master Agent + History Agent only Local Windows (Antigravity). All paths via env vars — AWS-portable.
Phase 2
70% corpus · Local
Full Pallava + Chalukya corpus. Personal Agent live locally. Gmail integration. VectorStoreFactory abstraction. History + Personal Agent live. Security Agent design begins. Still local. Docker compose runs all three agents.
Phase 3
90% corpus · AWS
Migrate to EC2. Mobile web app live. Select user beta for History Agent. Multi-dynasty queries. docx2pdf → LibreOffice. All agents live on EC2. Mobile web app as primary interface. EC2 t3.medium + EBS 50GB. ~$35–40/month.
Phase 4
100% · Public
History Agent public. Full Security + Doc Review Agent. Copyright-resolved images. Full delivery pipeline. All agents live. Security Agent audit loop running. EC2 t3.large+. Hosted vector DB. Auth layer. CDN. ~$80–120/month.
Phase 5
Post-launch
Automated knowledge validation loop. Query Agent + Review Agent. 50-question test set. Self-improving corpus. Validation loop running. Human sign-off required per round. Same EC2. No additional infrastructure.
Section 7 Phase 5 — Automated Knowledge Validation Loop

Once the History Agent and multi-agent architecture are complete and the corpus is fully built, Phase 5 introduces a self-improving validation loop. Rather than manually testing whether the system answers questions correctly, a dedicated Query Agent and Review Agent work together to surface gaps, hallucinations, and weak citations — automatically. This phase is not about adding new data. It is about verifying that what is already in the corpus is being retrieved, synthesised, and cited correctly.

Agent Architecture

AgentRole
Query Agent Fires a curated test set of natural language questions across all dynasties, inscription types, topics, and time periods
Multi-Agent System Receives each question, retrieves relevant chunks, generates a cited answer via the History Agent
Review Agent Independently evaluates each answer against the corpus — checks citation accuracy, factual consistency, and hallucination. Has no access to the generated answer during retrieval — evaluates output only
Human (you) Reviews the final consolidated report, provides sign-off or targeted feedback to continue

Test Set Design

Minimum 50 questions curated by you — the quality of the test set determines the quality of the validation. Questions must span 5 categories:

#CategoryExample question
1 Dynasty-specific "Who were the Pallava kings during the Imperial period and what did each build?"
2 Inscription-specific "What does the Velurpalaiyam copper plate grant say about land ownership?"
3 Cross-dynasty "How did Pallava temple architecture influence the Early Chola style?"
4 Corpus gap probe "What is known about Pallava naval activity?" (intentionally thin corpus area)
5 Contradiction probe "What date is assigned to the Shore Temple construction?" (tests conflict surfacing)

Loop Protocol

StepWhat happens
Round 1 Query Agent fires all N questions → Multi-Agent System answers each → Review Agent evaluates all answers → produces per-question scorecard (pass / flag / fail + reason)
Between rounds You review flagged/failed questions. Option A: fill corpus gaps (ingest missing sources). Option B: refine system prompt / RAG retrieval rules. Option C: accept known limitation (mark as out-of-scope)
Rounds 2–3 Re-run only flagged/failed questions from prior round → Review Agent re-evaluates
Exit condition ✅ Review Agent passes ≥90% of test questions, OR ✅ max 3 rounds reached → force exit with gap report, OR ✅ you provide final sign-off after reviewing consolidated report

Review Agent Evaluation Criteria

CheckPass condition
Citation presentEvery factual claim has ≥1 corpus chunk cited
Citation accurateThe cited chunk actually supports the claim made
No hallucinationNo king names, dates, or temple attributions absent from corpus
No biasAnswer is factually grounded — not editorially skewed, not missionary or western-biased narrative

Exit Deliverable

At the end of Phase 5, regardless of how many rounds ran, the system produces a consolidated report covering:

DocumentContents
Validation reportPer-question pass/fail/flag with Review Agent reasoning
Gap inventoryTopics where corpus coverage was insufficient for a confident answer
Prompt improvement logWhat was changed between rounds and why
Known limitationsTopics intentionally marked out-of-scope with justification
Sign-off recordYour final approval with date and notes — feeds Phase 6 / public launch preparation

Prerequisites

PrerequisiteStatus
Phase 3 complete — corpus ≥90%, star weights assignedPhase 3
Phase 4 complete — History Agent + multi-agent architecture stablePhase 4
50-question test set curated by you (1–2 hours)Manual
ChromaDB reindex completed with final corpus statePre-run
Key Guardrail: The Review Agent operates on the same ideological guardrail as the History Agent. Any answer flagged for left-leaning, missionary, or western-biased narrative on Indian Hindu history is automatically failed — regardless of citation accuracy. Bias failure triggers a system prompt refinement, not a corpus gap-fill.
Vision Doc v1.0 · April 2026 · Pallava Knowledge Base Project · Compiler: Sidda Jagadeesh Donthi Siddappa