Vision Doc — Agent J

Live Tracker Current Status

Last updated: April 2026 · Current phase: Phase 2 — Personal Agent + Gmail · Active sprint: Sprint 2 — Personal Agent + Gmail OAuth

Phase Progress

Phase	Status	Summary
Phase 1 — Local build	Complete ✓	Agent J skeleton + History Agent tool wiring + Claude Code MCP interface + basic web UI. Verified end-to-end.
Phase 2 — 70% corpus + Personal Agent	In Progress	Personal Agent locally. Gmail OAuth integration. Full Pallava corpus. VectorStoreFactory abstraction.
Phase 3 — AWS migration + mobile web	Not started	EC2 t3.medium + EBS. Mobile web app. Docker deploy. LibreOffice PDF conversion.
Phase 4 — Public launch	Not started	History Agent public. Security Agent. Auth layer. CDN.
Phase 5 — Validation loop	Not started	Automated Query + Review Agent loop. 50-question test set. Self-improving corpus.

Sprint 1 — Agent J Skeleton ✓ Complete

Task	Status	Notes
Monorepo structure — `agent-j/` project scaffold	Done ✓	Alongside `pallava-translator/` and `personal-agent/`. All paths via env vars.
FastAPI backend — `POST /agent/run` agentic loop	Done ✓	Claude tool-use routing. 5-step max loop. Runs on port 8001.
History Agent tool definitions (3 tools)	Done ✓	`rag_query`, `semantic_search`, `research_paper`
Agent J memory — SQLite	Done ✓	sessions, messages, routing_log, preferences tables
MCP tool definition for Claude Code	Done ✓	`agent_j_mcp.py` + `.claude/settings.json`
Basic web UI — `/agent-ui`	Done ✓	Chat UI with markdown rendering + tool call chips. Verified working.
Docker-ready layout	Done ✓	Dockerfile per agent. `docker-compose.yml` in monorepo root.

Sprint 2 — Personal Agent + Gmail (Current)

Task	Status	Notes
`personal-agent/` project scaffold	Done ✓	Port 8002. Same Docker-ready, env-var structure as `agent-j/`.
Gmail OAuth flow	Done ✓	Google Cloud project created. OAuth credentials configured. `/auth/gmail/start` → callback → token saved.
Gmail tool implementations	Done ✓	`gmail_list_emails`, `gmail_read_email`, `gmail_send_email`
Personal corpus — SQLite	Done ✓	facts, email_cache, tasks, preferences tables in `memory/personal.db`
Personal Agent tool definitions in Agent J	Done ✓	6 tools: gmail (3) + tasks (2) + remember_fact (1). 9 total tools in Agent J.
Agent J routing updated	Done ✓	History tools → `history_agent_client.py`. Personal tools → `personal_agent_client.py`.
Gmail OAuth authorisation	Pending	Start personal-agent → visit `http://localhost:8002/auth/gmail/start` → grant access
VectorStoreFactory abstraction	Pending	Deferred — required before EC2 migration (Phase 3)

Decisions Log — April 2026

Date	Decision	Rationale
Apr 2026	Monorepo with 3 separate projects (`agent-j`, `personal-agent`, `pallava-translator`)	Clean security boundary. History Agent stays public-facing. Personal Agent corpus fully isolated.
Apr 2026	Mobile web app instead of native iPhone app	Cost saving. Responsive web UI on AWS covers the use case. Native app deferred indefinitely.
Apr 2026	Claude Code MCP as sole interface for Phase 1	No infrastructure needed. Natural language → Agent J → sub-agents from Claude Code desktop.
Apr 2026	Gmail OAuth deferred to Sprint 2	Not needed for Agent J skeleton. Reduces Sprint 1 scope.
Apr 2026	Apple Calendar integration skipped for now	Requires macOS or iCloud API. Revisit when Personal Agent is in active sprint.
Apr 2026	AWS account with dedicated email (or +aws alias)	Security hygiene. AWS root email should be separate from personal email. Enable MFA immediately.
Apr 2026	Notes page in Pallava project is History Agent specific	Personal Agent will have its own separate notes/corpus. No migration needed.

Pallava Corpus Status (History Agent)

Metric	Value
Sources ingested	42 (38 PDFs, 2 images, 1 video, 1 URL)
Knowledge chunks	~4,000+
Anthropic monthly spend limit	Raised to $200 (April 2026)
Pending ingests	~6 PDFs failed due to spend limit — retry after limit raised
Phase 2 gate	70% corpus — not yet reached

Section 1 Agent J — Vision Overview

Agent J is a personal multi-agent AI system with a single natural language entry point. The Master Agent accepts any instruction and routes it to the appropriate sub-agent using Claude's native tool-use (function-calling) as the routing mechanism — no custom keyword matching. Sub-agents are thin wrappers around existing or new capabilities; they can be added incrementally without redesigning the core.

Master

Agent J

Single entry point. Accepts natural language instructions. Routes to sub-agents via tool-use. Coordinates end-to-end delivery (PDF, Word, PowerPoint → Gmail, WhatsApp).

Build first

Sub-Agent 1

History Agent

Indian history research — Pallava, Chalukya, and all dynasties. RAG queries, research paper generation, semantic search. Wraps existing KB infrastructure.

Phase 1 MVP

Sub-Agent 2

Personal Agent

Personal life management — Hindu calendar reminders (Ekadasi, Amavasya, festivals), Gmail, RSU vesting, travel + temple corpus, weekly digest, AI learning roadmap, personal corpus. Private always — never exposed externally. Requires SQLite state store.

Phase 2 — Local

Sub-Agent 3

Security Agent

Ongoing architecture audit. Independently reviews all system configurations and setups for vulnerabilities. Stateless audit loop.

Phase 4

Routing mechanism: Claude's native tool-use IS the routing logic. No if/elif, no regex, no keyword matching. Tool descriptions tell Claude what each sub-agent does — Claude picks the correct tool automatically. The agentic loop runs up to 5 steps; Claude can chain tools sequentially within a single instruction.

Interfaces

Interface	How	Status
Claude Code (MCP tool)	MCP tool definition calls `POST /agent/run`. Natural language in Claude Code → routes to Agent J backend.	Phase 1 — primary interface
Web UI (`/agent-ui`)	Simple chat-style text input → fetch → rendered markdown result. Matches existing design system.	Phase 1 — build alongside backend
Mobile web app	Responsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Add to Home Screen for app-like experience. Same backend — no App Store required.	Phase 3 — post AWS migration
Native iPhone app	SwiftUI app calling `POST /agent/run`. Requires $99/year Apple Developer account + significant build effort.	Deferred indefinitely — mobile web app covers the use case at zero extra cost

Output Delivery

Channel	Format	Status
Gmail — Send (SMTP)	Plain text, Word, PDF attachments outbound	Phase 1 — smtplib stdlib, no new packages
Gmail — Read (OAuth)	Inbox reading, prioritisation, action surfacing	Sprint 2 — Personal Agent. Requires Google Cloud OAuth credentials.
Word (.docx)	Research papers, reports	Already built — export_documents.py
PDF	All documents	Already built — docx2pdf (local). LibreOffice fallback on EC2.
PowerPoint (.pptx)	Presentations	Already built — routers/pptx.py
WhatsApp	Text messages via Twilio API	Phase 4+ — not confirmed. Under review.
Zoom	Summaries, meeting prep via Zoom OAuth	Phase 4+ — not confirmed. Under review.

Section 2 History Agent — Scope & Capabilities

Capability	Detail	Feasibility
Dynasty coverage	Start with Pallava + Chalukya; expand to all major Indian empires and dynasties	Now — just ingest PDFs. No code changes.
Corpus scale	1,000+ PDFs · 10,000+ images · 1,000+ web/research documents	ChromaDB handles this scale easily
South Indian languages	Telugu, Kannada, Tamil, Pallava Grantha, Grantha	Already deployed — multilingual-e5-large
Other Indian languages	Hindi, Sanskrit	Same multilingual model covers both
Cross-dynasty relationships	Accurate cross-dynasty details without hallucination	Phase 3 — needs corpus completeness + source weighting
Architectural style recognition	Identify dynasty/period/style from uploaded images	Partial now — POST /identify-temple exists. Full upgrade in Phase 2.

Why You Need the History Agent — Use Cases

The History Agent is the bridge between the knowledge corpus and natural language delivery. Without it, the KB answers one question at a time through a fixed RAG pipeline. With it, Claude reasons across multiple queries, resolves contradictions, and delivers finished output.

UC1 Multi-step research question

"What architectural features distinguish early Pallava temples from imperial Pallava temples, and which scholars agree vs disagree on the transition period?"

Without agent: One RAG query, top 5 chunks, one answer. Likely incomplete — a single query cannot capture both periods and the scholarly debate simultaneously.

With agent: Claude queries "early Pallava architecture", then "imperial Pallava architecture", then "Pallava temple transition period", compares Minakshi vs Sastri vs Longhurst on the dating, surfaces the disagreement, and synthesises a complete answer — all in one instruction.

Prerequisite: Tag filter + source weighting (Phase 2) must be in place for the agent to isolate and compare sources meaningfully.

UC2 Inscription translation with missing characters

"Translate this Pallava Grantha inscription fragment: śrī-nara…"

Without agent: One-shot prompt with whatever context is retrieved. If a character is ambiguous, the translation is a guess with no transparency.

With agent: Claude attempts the translation, identifies the ambiguous character, queries the script training registry for that specific glyph, retrieves the confidence score and Unicode mapping, retries the translation with that grounding, and explicitly flags any remaining uncertainty.

Prerequisite: Script registry API exposed as an agent tool (GET /training/image/file/{filename} already exists — needs a tool definition wrapper).

UC3 Cross-dynasty comparison

"How did Pallava temple iconography influence early Chola temples at Thanjavur?"

Without agent: Searches one collection, returns Pallava chunks OR Chola chunks — rarely both in the right proportion for a meaningful comparison.

With agent: Queries Pallava sources first, then Chola sources, identifies overlapping iconographic elements mentioned in both, and constructs a comparative answer with citations from each tradition.

Prerequisite: Consistent tagging convention across all ingested sources. If Chola sources are not tagged chola, the agent cannot isolate them. Tag discipline during ingestion is critical.

UC4 Research paper generation

"Write a 2,000-word research note on Mahendravarman I's contribution to rock-cut architecture"

Without agent: Single RAG call, limited context, generic structure. Cannot accumulate evidence across multiple retrieval steps.

With agent: Queries the KB section by section (early reign → cave temples → inscriptions → scholarly dating debates), accumulates evidence across multiple retrieval steps, resolves conflicts between sources, then generates a structured paper with inline citations — and delivers to Word/PDF when done.

Prerequisite: A structured system prompt defining output format, section headings, citation style, and the ideological guardrail. This is the most critical single instruction for the agent.

UC5 Gap detection — knowing what you don't know

"What do we know about Pallava naval history?"

Without agent: Returns whatever 5 chunks match, even if corpus coverage is thin. No signal that the answer is incomplete or unreliable.

With agent: Queries the topic, counts the evidence, and explicitly responds: "Only 2 chunks found across 3 sources — corpus coverage on Pallava naval history is limited. The available evidence suggests… but this should be treated as preliminary until more sources are ingested."

Prerequisite: Confidence reporting instruction in the system prompt — Claude must be told to always state source count and flag low-coverage topics explicitly.

History Agent — System Prompt Specification

The system prompt is the single most important setup item for the History Agent. Every use case above depends on specific instructions being present in it. The following seven components are required.

#	Instruction	What it enables	Required for
1	Identity & corpus declaration "You are the History Agent for the Pallava Knowledge Base. You have access to a scholarly corpus of [N] sources covering Pallava, Chalukya, and South Indian history."	Sets scope — Claude knows what it has access to and what is out of scope	All use cases
2	Tool definitions Explicit descriptions of each tool: `history_rag_query`, `history_semantic_search`, `history_research_paper`, `script_registry_lookup`. Claude picks tools automatically based on descriptions.	Enables multi-step reasoning — Claude decides which tool to call next based on what it found	UC1, UC2, UC3, UC4
3	Grounding rule "Every factual claim must be traceable to a retrieved chunk. If you cannot ground a claim in the corpus, state explicitly that this is your assessment and not sourced."	Prevents hallucination — all answers are citation-backed	All use cases, especially UC4
4	Ideological guardrail "Responses must be strictly fact-based and sourced from the scholarly corpus only. Do not favour left-leaning ideologies, Christian missionary perspectives, or western-biased narratives on Indian Hindu history."	Ensures scholarly neutrality and source fidelity across all outputs	All use cases, especially UC4
5	Confidence reporting "Always state how many chunks were retrieved and from how many distinct sources. If fewer than 3 chunks found, explicitly flag the answer as low-confidence and recommend ingesting additional sources."	Enables gap detection — sparse corpus topics are flagged rather than answered confidently	UC5, and as a quality signal on all use cases
6	Output format rules Research notes: section headings, inline citations (Author, Year), word count target. Translations: original script → IAST → meaning → confidence score. Comparisons: side-by-side structure with explicit agreement/disagreement flags.	Consistent, stakeholder-ready output without additional formatting prompts	UC2, UC3, UC4
7	Loop limit "Maximum 5 retrieval steps per instruction. If sufficient evidence is not found in 5 queries, summarise what was found and state the coverage gap."	Prevents infinite loops and runaway API costs on broad or ambiguous questions	UC1, UC3, UC4

Ideological Guardrail: The History Agent system prompt must explicitly state that responses are strictly fact-based and sourced only from the ingested scholarly corpus. The model must not favour left-leaning ideologies, Christian missionary perspectives, or western-biased narratives on Indian Hindu history. This guardrail will be added to: (1) Master Agent system prompt, (2) RAG answerer system prompt in rag/answerer.py, (3) research paper generation prompt.

History Agent — Dynasty Coverage

The History Agent is not limited to Pallava history. It queries the knowledge base using semantic search — dynasty names are not hardcoded anywhere in the agent logic. Any dynasty whose sources have been ingested and correctly tagged is automatically searchable, in any supported language. Two prerequisites must be met before multi-dynasty queries work reliably: corpus ingested per dynasty, and tags applied consistently at upload time using the convention below.

Dynasty	Period	Corpus Status	Agent Readiness
Pallava	275–897 CE	42 sources ingested — actively building	Partial — Phase 2 gate at 70%
Chalukya	543–753 CE	Not started	Blocked on corpus
Chola	300–1279 CE	Not started	Blocked on corpus
Rashtrakuta	753–982 CE	Not started	Blocked on corpus
Vijayanagara	1336–1646 CE	Not started	Blocked on corpus
Maurya / Gupta	322 BCE–550 CE	Not started	Blocked on corpus
Hoysala	1026–1343 CE	Not started	Blocked on corpus

Multilingual support is already active. The embedding model (intfloat/multilingual-e5-large) handles Tamil, Telugu, Kannada, Sanskrit, Hindi, and Pallava Grantha out of the box — no configuration needed per dynasty. Sources in regional languages are fully searchable today.

Tagging Convention v1.0

Tags are the agent's primary filtering mechanism. Without consistent tags, cross-dynasty comparisons (UC3) and topic-scoped queries fail silently — the agent finds results but cannot isolate the right subset. Every source must follow this convention at upload time. Tags cannot be automatically inferred after ingestion — retroactive patching via POST /admin/patch-tags is available but expensive at scale.

Every source should carry tags across 4 dimensions. Dimension 1 (Dynasty) and Dimension 2 (Source Type) are mandatory. Dimension 3 (Period) and Dimension 4 (Topic) are strongly recommended.

Dimension 1 — Dynasty Mandatory

One tag per primary dynasty. Add all that apply for multi-dynasty sources.

Tag	Dynasty	Period
`pallava`	Pallava	275–897 CE
`chola`	Chola	300–1279 CE
`chalukya`	Chalukya	543–753 CE
`rashtrakuta`	Rashtrakuta	753–982 CE
`vijayanagara`	Vijayanagara	1336–1646 CE
`maurya`	Maurya	322–185 BCE
`gupta`	Gupta	320–550 CE
`hoysala`	Hoysala	1026–1343 CE
`general-indian`	Multi-dynasty / pan-Indian scope	—

Dimension 2 — Source Type Mandatory

Drives source weighting in Phase 2. Every source must declare exactly one.

Tag	Meaning
`primary-source`	Inscriptions, coins, contemporary records created during the period
`excavation-report`	ASI reports, field surveys, archaeological publications
`scholarly`	Peer-reviewed academic books and papers by recognised historians
`reference`	Dictionaries, encyclopaedias, atlases, chronological tables
`web`	Wikipedia, blogs, online articles
`video`	YouTube lectures, documentaries

Dimension 3 — Period Recommended

Sub-period within a dynasty. Omit if uncertain — a missing tag is better than a wrong one.

Tag	Covers
`early-pallava`	275–575 CE — Simhavishnu and before
`imperial-pallava`	575–750 CE — Mahendravarman I to Narasimhavarman II
`late-pallava`	750–897 CE — Nandivarman II onwards
`early-chola`	300–850 CE
`imperial-chola`	850–1200 CE
`late-chola`	1200–1279 CE
Add equivalent period tags for other dynasties as their corpus is built

Dimension 4 — Topic Recommended

Subject matter. Multiple topic tags allowed and encouraged per source.

Tag	Covers
`architecture`	Temple design, structural features, building techniques
`inscription`	Epigraphic records, stone and copper plate inscriptions
`sculpture`	Iconography, relief panels, bronze casting
`painting`	Cave paintings, murals, manuscript illustration
`numismatics`	Coins, seals, medallions
`literature`	Poetry, Puranas, Sangam literature
`history`	Political history, genealogy, chronology, warfare
`religion`	Shaivism, Vaishnavism, Jainism, Buddhism in context
`geography`	Trade routes, ports, territorial extent
`language`	Script, grammar, phonology, epigraphy methodology

Examples — how real sources are tagged

Source	Tags (in order)
Minakshi — Administration and Social Life under the Pallavas	`pallava` `scholarly` `imperial-pallava` `history`
ASI Report — Mahabalipuram excavation	`pallava` `excavation-report` `imperial-pallava` `architecture` `sculpture`
Shore Temple copper plate inscription	`pallava` `primary-source` `imperial-pallava` `inscription`
Longhurst — Pallava Architecture	`pallava` `scholarly` `architecture`
YouTube — Kailasanathar Temple lecture	`pallava` `video` `imperial-pallava` `architecture`
Wikipedia — Pallava dynasty	`pallava` `web` `history`
Sastri — A History of South India	`pallava` `chola` `chalukya` `scholarly` `history` `general-indian`

Rules

Dynasty is always first in the tag list — makes the Library visually scannable
Source type is always second — drives Phase 2 source weighting
Never use spaces — use hyphens (early-pallava not early pallava)
Use existing tags before creating new ones — check all four dimensions above first
Multi-dynasty sources get all applicable dynasty tags — don't pick just one
When in doubt on period, omit it — a missing tag is better than a wrong one
Apply tags at upload time — retroactive patching via POST /admin/patch-tags works but is tedious at scale

Gate for new dynasties: Before ingesting the first source from any new dynasty (Chola, Chalukya, etc.), verify the tagging convention covers that dynasty's period tags. Add them to this document first, then begin ingestion. Do not ingest first and tag later.

History Agent Tools (Phase 1)

Tool Name	When Claude picks it	Wraps
`history_rag_query`	"Who built Shore Temple?" · "What language did Pallavas use?"	`rag/answerer.py:answer()`
`history_research_paper`	"Generate a paper on Pallava art" · "Summarise Pallava warfare"	`rag/retriever.py:retrieve()` + Claude synthesis
`history_semantic_search`	"Show me passages about" · "Find sources on" · "Browse"	`rag/retriever.py:retrieve()`

Section 3 Architecture Assessment — Q&A

Seven architectural questions assessed against the current codebase (April 2026). Full analysis of feasibility, gaps, and recommended approach for each.

Q1 Is the current design flexible for multi-agent architecture in the future?

✓ Yes — with one caveat

What works in your favour: The pluggable ingester pattern (BaseIngester → PdfIngester, DocxIngester, etc.) is clean and extensible. The RAG pipeline (rag/answerer.py + rag/retriever.py) consists of two separate, independently callable functions — a History Agent can wrap them without modification. Claude's native tool-use is the correct routing mechanism. The FastAPI router pattern (app/routers/) means adding an agent.py router is a natural extension.

The caveat — client coupling: app/dependencies.py returns a hardcoded anthropic.Anthropic() client. When the Master Agent needs to call sub-agents with different models (Haiku for fast routing, Opus for deep research), there is no way to configure this without touching multiple files. Fix: replace hardcoded model strings with os.environ.get("MODEL_SMART", "claude-opus-4-6") before wiring the agent loop.

The bigger risk is state: Background jobs (_jobs dict in admin.py) are in-memory. When a Master Agent starts a long-running research paper task, the result disappears on server restart. For the History Agent alone this is acceptable. Personal Agent (reminders, goals, Hindu calendar, Gmail across sessions) requires a persistent SQLite store from day one.

Summary: Flexible for History Agent now. Personal Agent needs SQLite state persistence first. Security Agent can run as a stateless audit loop.

Q2 Does it really work to use Claude Code or an Apple app as the main interface?

✓ Yes — both work, different timelines

Claude Code: Yes, immediately, no code changes needed. Claude Code's MCP tool interface can call any HTTP endpoint. You already have POST /agent/run planned. Write an MCP tool definition that calls this endpoint and Claude Code becomes a natural language front-end to your entire knowledge base today. This is the fastest path to a working agent interface.

Apple app (iPhone): Yes, but it is a separate project. Your FastAPI backend is REST-over-HTTP. Any iOS app can call it. The architecture is correct — app/main.py is already described as "the backend for a future iOS/Android app." What you need to build: a SwiftUI app with a chat-style UI, auth (a fixed API key in the app is sufficient for personal use), and the /agent/run endpoint on the backend. The iOS app does not change any backend design decisions. Build the backend first; the app is a client.

One practical issue with both: Long-running requests. Research paper generation takes 15–30 seconds. The fix is the same for both interfaces: convert long-running agent tasks to background jobs with polling (POST /agent/run → returns job_id → GET /agent/job/{id} polls for result). This pattern already exists in admin.py for PDF ingestion — apply it to the agent loop in the same way.

Q3 Do you see any issues with the current design for EC2 migration in the future?

⚠ Three concrete issues — one blocking

Issue 1 — Local file paths (most significant): source_library.py copies files to C:\Users\siddi\OneDrive\Personal\History\Pallavas\source_library\. On EC2 there is no OneDrive. Since this is already controlled by the PALLAVA_LIBRARY_DIR env var, this is a configuration change not a code change — but you need to plan where files live on EC2 (e.g. an EBS volume at /data/source_library) before migrating.

Issue 2 — ChromaDB is local (manageable): ChromaDB's PersistentClient writes to a local directory. On EC2 this directory must be on a persistent EBS volume, not the ephemeral root volume. Standard deployment practice — no code change required.

Issue 3 — docx2pdf is Windows-only (blocking): On Windows, docx2pdf uses Microsoft Word via COM automation. On Linux EC2 there is no Word. Replace with LibreOffice (libreoffice --headless --convert-to pdf) or Gotenberg. This is a 10-line code change but it will break on first deploy to Linux EC2 if not addressed.

Issue 4 — py -3 command: On EC2 (Linux), the Python command is python3, not py -3. Any shell scripts or subprocess calls using py -3 will fail. Minor — worth noting for deployment scripts.

Issue 5 — Unicode fonts for Word export (minor): The .docx research paper export uses Noto Serif to render IAST diacritical characters (ā, ī, ū, ṭ, ḍ etc.). On Windows this font must be installed manually. On EC2 (Ubuntu/Debian), run: sudo apt-get install fonts-noto-serif as part of the server setup script — one line, resolves all Unicode rendering issues in exported Word documents.

No structural issues. FastAPI + uvicorn is production-grade and runs identically on EC2. Tighten CORS (allow_origins=["*"]) before going public.

Q4 Is ChromaDB sufficient if I scale to multiple dynasties with thousands of PDF files?

✓ Yes — with one design decision to make

The numbers: 1,000 PDFs × ~100 chunks per PDF = ~100,000 chunks. 10,000 images × 1 description = 10,000 vectors. Total: ~110,000 vectors at 1024 dimensions. ChromaDB's HNSW index handles 10M+ vectors — 110K is trivial. At 100K vectors, HNSW queries return in <100ms locally. On EC2 with an EBS volume expect 200–400ms. Fully acceptable.

The real scaling risk is not ChromaDB — it is knowledge.json: knowledge_store.py loads the full knowledge.json into memory on every ingest operation. At 22.7 MB today (~5,000 chunks) this is fine. At 100,000 chunks this file will be ~450 MB and loading it on every ingest will be slow. The fix: stop using knowledge.json as a query store and query ChromaDB directly for everything. knowledge.json becomes an append-only audit log, not a lookup table. No re-ingestion needed — this is a code change to the ingest path only.

Multi-dynasty collection design — recommended approach: Rather than one collection per dynasty (chalukya_knowledge, chola_knowledge), use a single indian_knowledge collection with a dynasty metadata tag. All dynasties in one collection, filter by tag at query time. This makes cross-dynasty queries simple and natural — exactly what is needed for "cross-dynasty relationship details without hallucination." Migrate to this model when Chalukya ingestion begins — it is a reindex operation, not a code change.

Q5 How efficient is Claude OCR compared to competitors? How effective is it in this project?

✓ Claude is the right choice for this use case

Scenario	Claude Vision	Amazon Textract	Google Document AI
Modern printed English/IAST PDFs	Excellent	Excellent	Excellent
Pre-2000 Tamil PDFs (TSCII/legacy encoding)	Fix #2 handles this in preprocessing	Poor — no legacy encoding awareness	Poor
Sanskrit/Tamil Unicode in printed books	Good	Limited	Good (dedicated Indic models)
Stone inscription photographs (epigraphic)	Excellent — contextual reasoning	Poor — pixel-level only	Fair
Grantha / Vatteluttu script	Unique advantage — reasoning about rare scripts	Cannot handle	Cannot handle
Cost per page	~$0.003–0.015 (Haiku/Sonnet/Opus)	~$0.0015	~$0.0015
Structured extraction (tables, forms)	Good	Excellent	Excellent

Why Claude wins for this project specifically: The critical advantage is contextual understanding. When Claude sees a damaged inscription photograph with partial characters, it can reason: "This is a Grantha script Pallava copper plate, therefore this partially visible character is likely..." — Textract and Document AI do pixel-level pattern matching, not contextual reasoning. For scholarly epigraphic work this is not a small difference — it is the entire problem domain.

Where competitors are cheaper: For structured documents (invoices, forms, tables), Textract's AnalyzeDocument API is faster and cheaper. For high-volume modern printed text, Google Document AI is cheaper at scale. Neither of these scenarios applies to the primary use case.

Recommendation: Keep Claude for OCR. The OCR evaluation suggested in the requirements is worth doing for modern Indic printed text where Google's dedicated Indic models might reduce cost at scale. For inscription photographs and rare scripts, Claude is in a category of its own.

Q6 How difficult is it to switch to another vector database when I go live?

⚠ Moderately difficult — ~105 lines in 4 files

ChromaDB API calls appear in four files. The APIs between providers are not compatible — Pinecone, Weaviate, Qdrant, pgvector all have different client libraries and query syntax.

File	What uses ChromaDB	Lines
`knowledge_store.py`	`get_knowledge_collection()`, `upsert()`, `delete()`, `count()`	~30
`rag/retriever.py`	`collection.query()` — the most critical path	~20
`app/routers/gallery.py`	`get_gallery_collection()`, `upsert()`, `query()`	~25
`vector_store.py`	`pallava_inscriptions` collection management	~30

Pragmatic approach (recommended): Before going live on EC2, create a vector_store_factory.py with a thin VectorStore wrapper class that encapsulates the 4 operations actually used (upsert, query, delete, count). All 4 files import from this factory. When migrating, change one file. This is 2–3 hours of refactoring that buys clean migration later.

Hosted alternatives for Phase 4: Qdrant Cloud — closest API to ChromaDB, easiest migration. pgvector (PostgreSQL extension) — if one database for everything (agent state + vectors) is desired, compelling for EC2. Pinecone — managed, no EC2 required, but vendor lock-in and higher cost at scale.

Q7 Do you see any bottlenecks with the current design when scaling to multi-agent architecture?

⚠ Four bottlenecks identified

Bottleneck	When it matters	Fix
Synchronous blocking requests	Now — agent tasks take 15–60 seconds. With one uvicorn worker, no other request is served during this time.	Convert long-running agent tasks to background jobs with polling. Pattern already exists in `admin.py`.
`knowledge.json` full load on every ingest	Phase 2 — at ~50K+ chunks, loading 200+ MB into memory on every ingest becomes slow.	Remove `knowledge.json` as a runtime lookup; query ChromaDB directly. Keep .json as append-only audit log.
No session state for multi-turn conversations	Personal Agent from day one. History Agent is stateless — acceptable. Personal Agent (Hindu calendar reminders, Gmail, goals) is not.	SQLite `sessions` table — store `messages[]` per `session_id`, load on each `/agent/run` call.
ChromaDB single-writer constraint	Phase 3 — multi-user beta. Concurrent writes risk index corruption.	ChromaDB Server mode or migrate to hosted vector DB (Qdrant Cloud). Phase 3 concern only.

Priority: Fix the synchronous blocking issue before launching the agent loop — a 60-second HTTP request will time out on mobile and feels broken. Background jobs + polling solves this and is already a proven pattern in the codebase.

Section 4 Design Principles & Decisions

🔌

AI Provider Agnostic

Design must work with Claude, ChatGPT, Gemini, or any future provider without a full redesign.

Approach now: replace hardcoded "claude-opus-4-6" strings with MODEL_SMART / MODEL_FAST env vars. Full abstraction layer deferred to Phase 3.

🗄️

Vector DB Agnostic

No hard lock-in to ChromaDB. Must be portable to any vector database at EC2 migration time.

Approach: create vector_store_factory.py wrapping the 4 operations used. Single file to change at migration.

📈

Scalable

Handle 1,000+ PDFs, 10,000+ images, multiple dynasties. Grow from personal → select users → public.

ChromaDB is sufficient through Phase 3. Migrate to hosted DB at Phase 4 public launch.

🔮

Future-Proof

Architecture must accommodate AI advancements without requiring full redesigns. Flexibility is the key design constraint.

Pluggable ingesters, separate RAG functions, tool-use routing — all extensible without touching core.

🔒

Confidential

Project data must never be used to train external models.

Anthropic API does not train on API data by default. Confirm API tier. All data local or OneDrive.

⚖️

Fact-Based

History Agent must never favour left-leaning, missionary, or western-biased narratives on Indian history.

Explicit guardrail in Master Agent, RAG answerer, and research paper prompts. All answers grounded in corpus only.

Section 5 Key Decisions Made

✓
Tool-use routing, not regex or keyword matching Claude's native function-calling is the routing mechanism. Tool descriptions tell Claude what each sub-agent does. No custom dispatch logic needed — Claude picks the right tool automatically.
✓
Build and verify History Agent before adding other sub-agents Personal Agent (requires SQLite schema + 5–8 tools) and Security Agent are deferred until the Master Agent routing loop is proven end-to-end with the History Agent alone.
✓
Two-stage ingestion before large-scale corpus expansion Stage 1: upload PDFs without triggering ingestion (status: Uploaded). Stage 2: manual ingest trigger per document. Solves the re-ingest cost problem before committing to full-scale corpus.
✓
corpus_engine.py consolidation before visual recognition features Backlog #8 — remove duplicated Vision prompt, parser, and registry I/O from gallery.py and crawl_wisdomlib_gallery.py. All future features (POST /identify, video analysis) must be built on the shared foundation.
✓
Save PDF page images on first ingest (Backlog #9) PyMuPDF renders pages locally for free. Save to corpus/pdf_pages/{fingerprint}/ during first ingest. Eliminates all future re-ingest costs — pay Claude Vision once per PDF, never again.
✓
Single indian_knowledge ChromaDB collection with dynasty metadata tags Rather than separate collections per dynasty, use one collection with a dynasty tag. Enables natural cross-dynasty queries. Migrate when Chalukya ingestion begins — reindex only, no API cost.
✓
Background jobs + polling for all long-running agent tasks Research paper generation (15–30 sec) and multi-step agent loops (45–60 sec) must not block the HTTP thread. The _jobs pattern from admin.py is reused for the agent router.
○
Env-var model names (deferred — Phase 2) Replace all hardcoded "claude-opus-4-6" string literals with os.environ.get("MODEL_SMART", "claude-opus-4-6"). No abstraction layer — just configurable strings. Enables model swaps without code changes.
○
VectorStoreFactory abstraction (deferred — before EC2 migration) Create a thin wrapper class in vector_store_factory.py for the 4 ChromaDB operations used across the codebase. All files import from the factory. Enables clean migration to Qdrant, pgvector, or Pinecone by changing one file.

Section 6 Deployment Roadmap

Agent Architecture

Agent	Project	Visibility	Calls
Agent J (Master)	`C:/Personal/AI Project/agent-j/`	Personal only	History Agent + Personal Agent via HTTP
History Agent	`C:/Personal/AI Project/pallava-translator/`	Public-facing (Phase 4)	No access to Personal Agent or Agent J internals
Personal Agent	`C:/Personal/AI Project/personal-agent/`	Private only — never exposed	Can call History Agent via Master. Private corpus stays isolated.
Security Agent	TBD (Phase 4)	Internal only	Stateless audit loop across all agents

Interfaces

Interface	Phase	How	Cost
Claude Code (MCP)	Phase 1 — Now	MCP tool definition calls `POST /agent/run` on local machine. Natural language in Claude Code → Agent J routes to sub-agents.	$0
Web UI (`/agent-ui`)	Phase 1 — Now	Simple chat-style text input → fetch → rendered markdown. Built alongside Agent J backend.	$0
Mobile web app	Phase 3 — Post AWS	Responsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Can be added to Home Screen — looks and feels like a native app. Same backend, no App Store required.	Included in EC2 cost (~$35/month)
Native iPhone app (SwiftUI)	Phase 4+ — Optional	Deferred indefinitely. Mobile web app covers the use case at zero additional cost. Native app only if specific iOS features (push notifications, offline mode) are needed.	$99/year Apple Developer + build effort. Deferred.

Development Workflow (Local → AWS)

Step Detail

1. Develop locally Write and test on Antigravity at localhost:8000. All three agents run via docker compose up in agent-j/.

2. Push to GitHub git push origin main — private repo. Corpus data (PDFs, ChromaDB, SQLite) never goes to GitHub — stays on local disk and EBS volume.

3. Install system dependencies sudo apt-get install fonts-noto-serif libreoffice — Noto Serif for Unicode IAST characters in Word exports; LibreOffice as docx2pdf replacement for PDF conversion on Linux.

4. Copy corpus data to EBS

Corpus files are excluded from git (too large). What to copy and what to skip:

Item	Copy?	Impact if missing	Action
`corpus/chroma_db/`	Via Git LFS	No RAG/search/research papers	Set up Git LFS — then `git clone` handles it automatically
`source_library/` PDFs	Phase 3: `scp`	Downloads fail; research paper images missing; can't re-ingest	`scp -r "C:/Users/siddi/OneDrive/.../source_library/" ec2-user@<IP>:/data/source_library/` then set `PALLAVA_LIBRARY_DIR=/data/source_library` in Parameter Store. ~15 min, zero code changes.
`corpus/pdf_pages/`	❌ Skip	Research paper images missing until regenerated	Auto-regenerated on next ingest. Do not copy — 7.5GB of disposable cache.
`gmail_token.json`	✅ `scp` only	Gmail won't work — must re-authorise OAuth	Never in git (secret). `scp personal-agent/auth/gmail_token.json ec2-user@<IP>:/data/auth/`

Phase 4 — Move source_library to S3 (~half day, 3 files, ~50 lines):

source_library.py — upload to S3 instead of local disk (~20 lines)
app/routers/library.py — serve downloads as S3 presigned URLs (~10 lines)
app/routers/research.py + ingest/pdf_ingester.py — download from S3 to temp file before rendering (~20 lines)

No DB schema changes, no ChromaDB changes, no frontend changes.

⚠ Known risks with corpus copy:

File paths will break (most important) — source_library.py stores absolute Windows paths (C:\Users\siddi\OneDrive\...). Run a path migration script after copying to rewrite all stored paths to the EBS mount (e.g. /data/source_library/).
ChromaDB copy risk — stop all agents before copying chroma_db/. Safer: skip copy, run POST /admin/reindex on EC2 (~5 min) to rebuild fresh from knowledge.json (already in git).
OneDrive sync conflict — pause OneDrive sync before running scp to avoid partial/locked files.
File permissions — run chmod -R 644 /data/ on EC2 after copy.

5. Deploy to EC2 SSH into EC2 → git clone https://github.com/Sid00009/Project_OM.git && docker compose up -d --build. Later: automate with GitHub Actions (optional).

AWS Setup (Recommended)

Resource	Spec	Cost	Notes
EC2	t3.medium (2 vCPU, 4GB RAM)	~$30/month	Upgrade to t3.large if embedding gets slow. 2 min, no downtime.
EBS Volume	50GB gp3	~$5/month	Stores corpus, ChromaDB, SQLite, uploaded files. Persists across EC2 restarts.
Route 53 (optional)	Custom domain	~$0.50/month	e.g. `agent-j.yourdomain.com`. Not required for personal use.
Total Phase 3	—	~$35-40/month	Covers all three agents + mobile web app + corpus storage.

AWS account: No connection to Claude/Anthropic account. Use a dedicated email or siddijagadeesh+aws@gmail.com. Enable MFA on root account immediately after creation.

Secrets & Credentials Management

The same os.environ.get() pattern is used in all three agents — credentials are never hardcoded. The source of those environment variables changes per environment, but the application code does not.

Environment	Where secrets live	How app reads them	Notes
Local (now)	`.env` file in each project root — loaded by `python-dotenv` at startup	`os.environ.get("KEY")` — same in all `app/config.py` files	`.env` is in `.gitignore` — never committed to GitHub
AWS EC2 (Phase 3)	AWS Systems Manager Parameter Store (free tier) or Secrets Manager (~$0.40/secret/month)	Startup script exports parameters as env vars before uvicorn starts. App code unchanged.	No `.env` file on server — AWS injects values at boot

Secrets per Agent

Agent	Secret	Local	AWS
All agents	`ANTHROPIC_API_KEY`	`.env` file	Parameter Store → env var
Personal Agent	`GMAIL_CLIENT_ID`, `GMAIL_CLIENT_SECRET`	`personal-agent/.env`	Parameter Store → env var
Personal Agent	Gmail refresh token (`auth/gmail_token.json`)	Local file — in `.gitignore`	EBS volume mounted at `/app/auth/` — persists across restarts
Agent J	SQLite memory DB (`memory/agent_j.db`)	Local file	EBS volume mounted at `/app/memory/`

Gmail OAuth Setup (one-time, per machine)

Step	Detail
1. Google Cloud project	Create project `agent-j-personal` at console.cloud.google.com
2. Enable Gmail API	APIs & Services → Library → Gmail API → Enable
3. OAuth consent screen	External · App name: Agent J · Support email: your Gmail · Add yourself as test user
4. Create OAuth credentials	Credentials → Create OAuth 2.0 Client ID → Web application · Redirect URI: `http://localhost:8002/auth/gmail/callback`
5. Save credentials	Copy Client ID + Client Secret into `personal-agent/.env`
6. Authorise	Start Personal Agent → visit `http://localhost:8002/auth/gmail/start` → grant access → token saved to `auth/gmail_token.json`
7. On AWS (Phase 3)	Complete OAuth once locally → copy `gmail_token.json` to EBS volume on EC2 → token auto-refreshes, no re-authorisation needed

Security rules: Never commit .env or gmail_token.json to GitHub — both are in .gitignore. If either is accidentally exposed, rotate immediately: Anthropic dashboard for API keys, Google Cloud Console → revoke token for Gmail. On AWS, the EC2 instance uses an IAM role to read Parameter Store — no credentials are stored in EC2 config files.

⚠ Gmail re-authorisation required after password change: If you change your Google account password, the Gmail refresh token is immediately invalidated. You must re-run the OAuth flow: start Personal Agent → visit http://localhost:8002/auth/gmail/start → grant access again. The same applies if you revoke access manually in Google Account → Security → Third-party access, or if the token has been unused for 6 months.

⚠ API key in start_all.bat — screen sharing risk: The ANTHROPIC_API_KEY is stored in plain text inside start_all.bat for convenience. This file is not committed to GitHub, but the key will be visible if you share your screen, share the file, or if someone accesses your laptop. Before any screen share or demo session, close the file in your editor and do not open it. If the key is ever exposed, rotate it immediately at console.anthropic.com → API Keys.

Phase Roadmap

Phase	Scope	Agent J Status	Infrastructure
Phase 1 Now · Local	Agent J skeleton. History Agent wired as first tool. Claude Code MCP interface. Basic web UI. Docker-ready layout from day one.	Master Agent + History Agent only	Local Windows (Antigravity). All paths via env vars — AWS-portable.
Phase 2 70% corpus · Local	Full Pallava + Chalukya corpus. Personal Agent live locally. Gmail integration. VectorStoreFactory abstraction.	History + Personal Agent live. Security Agent design begins.	Still local. Docker compose runs all three agents.
Phase 3 90% corpus · AWS	Migrate to EC2. Mobile web app live. Select user beta for History Agent. Multi-dynasty queries. `docx2pdf` → LibreOffice.	All agents live on EC2. Mobile web app as primary interface.	EC2 t3.medium + EBS 50GB. ~$35–40/month.
Phase 4 100% · Public	History Agent public. Full Security + Doc Review Agent. Copyright-resolved images. Full delivery pipeline.	All agents live. Security Agent audit loop running.	EC2 t3.large+. Hosted vector DB. Auth layer. CDN. ~$80–120/month.
Phase 5 Post-launch	Automated knowledge validation loop. Query Agent + Review Agent. 50-question test set. Self-improving corpus.	Validation loop running. Human sign-off required per round.	Same EC2. No additional infrastructure.

Section 7 Phase 5 — Automated Knowledge Validation Loop

Once the History Agent and multi-agent architecture are complete and the corpus is fully built, Phase 5 introduces a self-improving validation loop. Rather than manually testing whether the system answers questions correctly, a dedicated Query Agent and Review Agent work together to surface gaps, hallucinations, and weak citations — automatically. This phase is not about adding new data. It is about verifying that what is already in the corpus is being retrieved, synthesised, and cited correctly.

Agent Architecture

Agent	Role
Query Agent	Fires a curated test set of natural language questions across all dynasties, inscription types, topics, and time periods
Multi-Agent System	Receives each question, retrieves relevant chunks, generates a cited answer via the History Agent
Review Agent	Independently evaluates each answer against the corpus — checks citation accuracy, factual consistency, and hallucination. Has no access to the generated answer during retrieval — evaluates output only
Human (you)	Reviews the final consolidated report, provides sign-off or targeted feedback to continue

Test Set Design

Minimum 50 questions curated by you — the quality of the test set determines the quality of the validation. Questions must span 5 categories:

#	Category	Example question
1	Dynasty-specific	"Who were the Pallava kings during the Imperial period and what did each build?"
2	Inscription-specific	"What does the Velurpalaiyam copper plate grant say about land ownership?"
3	Cross-dynasty	"How did Pallava temple architecture influence the Early Chola style?"
4	Corpus gap probe	"What is known about Pallava naval activity?" (intentionally thin corpus area)
5	Contradiction probe	"What date is assigned to the Shore Temple construction?" (tests conflict surfacing)

Loop Protocol

Step	What happens
Round 1	Query Agent fires all N questions → Multi-Agent System answers each → Review Agent evaluates all answers → produces per-question scorecard (pass / flag / fail + reason)
Between rounds	You review flagged/failed questions. Option A: fill corpus gaps (ingest missing sources). Option B: refine system prompt / RAG retrieval rules. Option C: accept known limitation (mark as out-of-scope)
Rounds 2–3	Re-run only flagged/failed questions from prior round → Review Agent re-evaluates
Exit condition	✅ Review Agent passes ≥90% of test questions, OR ✅ max 3 rounds reached → force exit with gap report, OR ✅ you provide final sign-off after reviewing consolidated report

Review Agent Evaluation Criteria

Check	Pass condition
Citation present	Every factual claim has ≥1 corpus chunk cited
Citation accurate	The cited chunk actually supports the claim made
No hallucination	No king names, dates, or temple attributions absent from corpus
No bias	Answer is factually grounded — not editorially skewed, not missionary or western-biased narrative

Exit Deliverable

At the end of Phase 5, regardless of how many rounds ran, the system produces a consolidated report covering:

Document	Contents
Validation report	Per-question pass/fail/flag with Review Agent reasoning
Gap inventory	Topics where corpus coverage was insufficient for a confident answer
Prompt improvement log	What was changed between rounds and why
Known limitations	Topics intentionally marked out-of-scope with justification
Sign-off record	Your final approval with date and notes — feeds Phase 6 / public launch preparation

Prerequisites

Prerequisite	Status
Phase 3 complete — corpus ≥90%, star weights assigned	Phase 3
Phase 4 complete — History Agent + multi-agent architecture stable	Phase 4
50-question test set curated by you (1–2 hours)	Manual
ChromaDB reindex completed with final corpus state	Pre-run

Key Guardrail: The Review Agent operates on the same ideological guardrail as the History Agent. Any answer flagged for left-leaning, missionary, or western-biased narrative on Indian Hindu history is automatically failed — regardless of citation accuracy. Bias failure triggers a system prompt refinement, not a corpus gap-fill.

Vision Doc — Agent J v1.0 · April 2026

Phase Progress

Sprint 1 — Agent J Skeleton ✓ Complete

Sprint 2 — Personal Agent + Gmail (Current)

Decisions Log — April 2026

Pallava Corpus Status (History Agent)

Interfaces

Output Delivery

Why You Need the History Agent — Use Cases

History Agent — System Prompt Specification

History Agent — Dynasty Coverage

Tagging Convention v1.0

Dimension 1 — Dynasty Mandatory

Dimension 2 — Source Type Mandatory

Dimension 3 — Period Recommended

Dimension 4 — Topic Recommended

Examples — how real sources are tagged

Rules

History Agent Tools (Phase 1)

Agent Architecture

Interfaces

Development Workflow (Local → AWS)

AWS Setup (Recommended)

Secrets & Credentials Management

Secrets per Agent

Gmail OAuth Setup (one-time, per machine)

Phase Roadmap

Agent Architecture

Test Set Design

Loop Protocol

Review Agent Evaluation Criteria

Exit Deliverable

Prerequisites