Multi-agent architecture vision, design principles & architectural assessment
| Phase | Status | Summary |
|---|---|---|
| Phase 1 — Local build | Complete ✓ | Agent J skeleton + History Agent tool wiring + Claude Code MCP interface + basic web UI. Verified end-to-end. |
| Phase 2 — 70% corpus + Personal Agent | In Progress | Personal Agent locally. Gmail OAuth integration. Full Pallava corpus. VectorStoreFactory abstraction. |
| Phase 3 — AWS migration + mobile web | Not started | EC2 t3.medium + EBS. Mobile web app. Docker deploy. LibreOffice PDF conversion. |
| Phase 4 — Public launch | Not started | History Agent public. Security Agent. Auth layer. CDN. |
| Phase 5 — Validation loop | Not started | Automated Query + Review Agent loop. 50-question test set. Self-improving corpus. |
| Task | Status | Notes |
|---|---|---|
Monorepo structure — agent-j/ project scaffold |
Done ✓ | Alongside pallava-translator/ and personal-agent/. All paths via env vars. |
FastAPI backend — POST /agent/run agentic loop |
Done ✓ | Claude tool-use routing. 5-step max loop. Runs on port 8001. |
| History Agent tool definitions (3 tools) | Done ✓ | rag_query, semantic_search, research_paper |
| Agent J memory — SQLite | Done ✓ | sessions, messages, routing_log, preferences tables |
| MCP tool definition for Claude Code | Done ✓ | agent_j_mcp.py + .claude/settings.json |
Basic web UI — /agent-ui |
Done ✓ | Chat UI with markdown rendering + tool call chips. Verified working. |
| Docker-ready layout | Done ✓ | Dockerfile per agent. docker-compose.yml in monorepo root. |
| Task | Status | Notes |
|---|---|---|
personal-agent/ project scaffold |
Done ✓ | Port 8002. Same Docker-ready, env-var structure as agent-j/. |
| Gmail OAuth flow | Done ✓ | Google Cloud project created. OAuth credentials configured. /auth/gmail/start → callback → token saved. |
| Gmail tool implementations | Done ✓ | gmail_list_emails, gmail_read_email, gmail_send_email |
| Personal corpus — SQLite | Done ✓ | facts, email_cache, tasks, preferences tables in memory/personal.db |
| Personal Agent tool definitions in Agent J | Done ✓ | 6 tools: gmail (3) + tasks (2) + remember_fact (1). 9 total tools in Agent J. |
| Agent J routing updated | Done ✓ | History tools → history_agent_client.py. Personal tools → personal_agent_client.py. |
| Gmail OAuth authorisation | Pending | Start personal-agent → visit http://localhost:8002/auth/gmail/start → grant access |
| VectorStoreFactory abstraction | Pending | Deferred — required before EC2 migration (Phase 3) |
| Date | Decision | Rationale |
|---|---|---|
| Apr 2026 | Monorepo with 3 separate projects (agent-j, personal-agent, pallava-translator) |
Clean security boundary. History Agent stays public-facing. Personal Agent corpus fully isolated. |
| Apr 2026 | Mobile web app instead of native iPhone app | Cost saving. Responsive web UI on AWS covers the use case. Native app deferred indefinitely. |
| Apr 2026 | Claude Code MCP as sole interface for Phase 1 | No infrastructure needed. Natural language → Agent J → sub-agents from Claude Code desktop. |
| Apr 2026 | Gmail OAuth deferred to Sprint 2 | Not needed for Agent J skeleton. Reduces Sprint 1 scope. |
| Apr 2026 | Apple Calendar integration skipped for now | Requires macOS or iCloud API. Revisit when Personal Agent is in active sprint. |
| Apr 2026 | AWS account with dedicated email (or +aws alias) | Security hygiene. AWS root email should be separate from personal email. Enable MFA immediately. |
| Apr 2026 | Notes page in Pallava project is History Agent specific | Personal Agent will have its own separate notes/corpus. No migration needed. |
| Metric | Value |
|---|---|
| Sources ingested | 42 (38 PDFs, 2 images, 1 video, 1 URL) |
| Knowledge chunks | ~4,000+ |
| Anthropic monthly spend limit | Raised to $200 (April 2026) |
| Pending ingests | ~6 PDFs failed due to spend limit — retry after limit raised |
| Phase 2 gate | 70% corpus — not yet reached |
Agent J is a personal multi-agent AI system with a single natural language entry point. The Master Agent accepts any instruction and routes it to the appropriate sub-agent using Claude's native tool-use (function-calling) as the routing mechanism — no custom keyword matching. Sub-agents are thin wrappers around existing or new capabilities; they can be added incrementally without redesigning the core.
| Interface | How | Status |
|---|---|---|
| Claude Code (MCP tool) | MCP tool definition calls POST /agent/run. Natural language in Claude Code → routes to Agent J backend. | Phase 1 — primary interface |
Web UI (/agent-ui) | Simple chat-style text input → fetch → rendered markdown result. Matches existing design system. | Phase 1 — build alongside backend |
| Mobile web app | Responsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Add to Home Screen for app-like experience. Same backend — no App Store required. | Phase 3 — post AWS migration |
| Native iPhone app | SwiftUI app calling POST /agent/run. Requires $99/year Apple Developer account + significant build effort. | Deferred indefinitely — mobile web app covers the use case at zero extra cost |
| Channel | Format | Status |
|---|---|---|
| Gmail — Send (SMTP) | Plain text, Word, PDF attachments outbound | Phase 1 — smtplib stdlib, no new packages |
| Gmail — Read (OAuth) | Inbox reading, prioritisation, action surfacing | Sprint 2 — Personal Agent. Requires Google Cloud OAuth credentials. |
| Word (.docx) | Research papers, reports | Already built — export_documents.py |
| All documents | Already built — docx2pdf (local). LibreOffice fallback on EC2. | |
| PowerPoint (.pptx) | Presentations | Already built — routers/pptx.py |
| Text messages via Twilio API | Phase 4+ — not confirmed. Under review. | |
| Zoom | Summaries, meeting prep via Zoom OAuth | Phase 4+ — not confirmed. Under review. |
| Capability | Detail | Feasibility |
|---|---|---|
| Dynasty coverage | Start with Pallava + Chalukya; expand to all major Indian empires and dynasties | Now — just ingest PDFs. No code changes. |
| Corpus scale | 1,000+ PDFs · 10,000+ images · 1,000+ web/research documents | ChromaDB handles this scale easily |
| South Indian languages | Telugu, Kannada, Tamil, Pallava Grantha, Grantha | Already deployed — multilingual-e5-large |
| Other Indian languages | Hindi, Sanskrit | Same multilingual model covers both |
| Cross-dynasty relationships | Accurate cross-dynasty details without hallucination | Phase 3 — needs corpus completeness + source weighting |
| Architectural style recognition | Identify dynasty/period/style from uploaded images | Partial now — POST /identify-temple exists. Full upgrade in Phase 2. |
The History Agent is the bridge between the knowledge corpus and natural language delivery. Without it, the KB answers one question at a time through a fixed RAG pipeline. With it, Claude reasons across multiple queries, resolves contradictions, and delivers finished output.
"What architectural features distinguish early Pallava temples from imperial Pallava temples, and which scholars agree vs disagree on the transition period?"
Without agent: One RAG query, top 5 chunks, one answer. Likely incomplete — a single query cannot capture both periods and the scholarly debate simultaneously.
With agent: Claude queries "early Pallava architecture", then "imperial Pallava architecture", then "Pallava temple transition period", compares Minakshi vs Sastri vs Longhurst on the dating, surfaces the disagreement, and synthesises a complete answer — all in one instruction.
Prerequisite: Tag filter + source weighting (Phase 2) must be in place for the agent to isolate and compare sources meaningfully.
"Translate this Pallava Grantha inscription fragment: śrī-nara…"
Without agent: One-shot prompt with whatever context is retrieved. If a character is ambiguous, the translation is a guess with no transparency.
With agent: Claude attempts the translation, identifies the ambiguous character, queries the script training registry for that specific glyph, retrieves the confidence score and Unicode mapping, retries the translation with that grounding, and explicitly flags any remaining uncertainty.
Prerequisite: Script registry API exposed as an agent tool (GET /training/image/file/{filename} already exists — needs a tool definition wrapper).
"How did Pallava temple iconography influence early Chola temples at Thanjavur?"
Without agent: Searches one collection, returns Pallava chunks OR Chola chunks — rarely both in the right proportion for a meaningful comparison.
With agent: Queries Pallava sources first, then Chola sources, identifies overlapping iconographic elements mentioned in both, and constructs a comparative answer with citations from each tradition.
Prerequisite: Consistent tagging convention across all ingested sources. If Chola sources are not tagged chola, the agent cannot isolate them. Tag discipline during ingestion is critical.
"Write a 2,000-word research note on Mahendravarman I's contribution to rock-cut architecture"
Without agent: Single RAG call, limited context, generic structure. Cannot accumulate evidence across multiple retrieval steps.
With agent: Queries the KB section by section (early reign → cave temples → inscriptions → scholarly dating debates), accumulates evidence across multiple retrieval steps, resolves conflicts between sources, then generates a structured paper with inline citations — and delivers to Word/PDF when done.
Prerequisite: A structured system prompt defining output format, section headings, citation style, and the ideological guardrail. This is the most critical single instruction for the agent.
"What do we know about Pallava naval history?"
Without agent: Returns whatever 5 chunks match, even if corpus coverage is thin. No signal that the answer is incomplete or unreliable.
With agent: Queries the topic, counts the evidence, and explicitly responds: "Only 2 chunks found across 3 sources — corpus coverage on Pallava naval history is limited. The available evidence suggests… but this should be treated as preliminary until more sources are ingested."
Prerequisite: Confidence reporting instruction in the system prompt — Claude must be told to always state source count and flag low-coverage topics explicitly.
The system prompt is the single most important setup item for the History Agent. Every use case above depends on specific instructions being present in it. The following seven components are required.
| # | Instruction | What it enables | Required for |
|---|---|---|---|
| 1 | Identity & corpus declaration "You are the History Agent for the Pallava Knowledge Base. You have access to a scholarly corpus of [N] sources covering Pallava, Chalukya, and South Indian history." |
Sets scope — Claude knows what it has access to and what is out of scope | All use cases |
| 2 | Tool definitions Explicit descriptions of each tool: history_rag_query,
history_semantic_search, history_research_paper,
script_registry_lookup. Claude picks tools automatically based on descriptions.
|
Enables multi-step reasoning — Claude decides which tool to call next based on what it found | UC1, UC2, UC3, UC4 |
| 3 | Grounding rule "Every factual claim must be traceable to a retrieved chunk. If you cannot ground a claim in the corpus, state explicitly that this is your assessment and not sourced." |
Prevents hallucination — all answers are citation-backed | All use cases, especially UC4 |
| 4 | Ideological guardrail "Responses must be strictly fact-based and sourced from the scholarly corpus only. Do not favour left-leaning ideologies, Christian missionary perspectives, or western-biased narratives on Indian Hindu history." |
Ensures scholarly neutrality and source fidelity across all outputs | All use cases, especially UC4 |
| 5 | Confidence reporting "Always state how many chunks were retrieved and from how many distinct sources. If fewer than 3 chunks found, explicitly flag the answer as low-confidence and recommend ingesting additional sources." |
Enables gap detection — sparse corpus topics are flagged rather than answered confidently | UC5, and as a quality signal on all use cases |
| 6 | Output format rules Research notes: section headings, inline citations (Author, Year), word count target. Translations: original script → IAST → meaning → confidence score. Comparisons: side-by-side structure with explicit agreement/disagreement flags. |
Consistent, stakeholder-ready output without additional formatting prompts | UC2, UC3, UC4 |
| 7 | Loop limit "Maximum 5 retrieval steps per instruction. If sufficient evidence is not found in 5 queries, summarise what was found and state the coverage gap." |
Prevents infinite loops and runaway API costs on broad or ambiguous questions | UC1, UC3, UC4 |
rag/answerer.py, (3) research paper generation prompt.
The History Agent is not limited to Pallava history. It queries the knowledge base using semantic search — dynasty names are not hardcoded anywhere in the agent logic. Any dynasty whose sources have been ingested and correctly tagged is automatically searchable, in any supported language. Two prerequisites must be met before multi-dynasty queries work reliably: corpus ingested per dynasty, and tags applied consistently at upload time using the convention below.
| Dynasty | Period | Corpus Status | Agent Readiness |
|---|---|---|---|
| Pallava | 275–897 CE | 42 sources ingested — actively building | Partial — Phase 2 gate at 70% |
| Chalukya | 543–753 CE | Not started | Blocked on corpus |
| Chola | 300–1279 CE | Not started | Blocked on corpus |
| Rashtrakuta | 753–982 CE | Not started | Blocked on corpus |
| Vijayanagara | 1336–1646 CE | Not started | Blocked on corpus |
| Maurya / Gupta | 322 BCE–550 CE | Not started | Blocked on corpus |
| Hoysala | 1026–1343 CE | Not started | Blocked on corpus |
intfloat/multilingual-e5-large) handles Tamil, Telugu, Kannada, Sanskrit,
Hindi, and Pallava Grantha out of the box — no configuration needed per dynasty.
Sources in regional languages are fully searchable today.
Tags are the agent's primary filtering mechanism. Without consistent tags, cross-dynasty comparisons
(UC3) and topic-scoped queries fail silently — the agent finds results but cannot isolate the right
subset. Every source must follow this convention at upload time.
Tags cannot be automatically inferred after ingestion —
retroactive patching via POST /admin/patch-tags is available but expensive at scale.
Every source should carry tags across 4 dimensions. Dimension 1 (Dynasty) and Dimension 2 (Source Type) are mandatory. Dimension 3 (Period) and Dimension 4 (Topic) are strongly recommended.
One tag per primary dynasty. Add all that apply for multi-dynasty sources.
| Tag | Dynasty | Period |
|---|---|---|
pallava | Pallava | 275–897 CE |
chola | Chola | 300–1279 CE |
chalukya | Chalukya | 543–753 CE |
rashtrakuta | Rashtrakuta | 753–982 CE |
vijayanagara | Vijayanagara | 1336–1646 CE |
maurya | Maurya | 322–185 BCE |
gupta | Gupta | 320–550 CE |
hoysala | Hoysala | 1026–1343 CE |
general-indian | Multi-dynasty / pan-Indian scope | — |
Drives source weighting in Phase 2. Every source must declare exactly one.
| Tag | Meaning |
|---|---|
primary-source | Inscriptions, coins, contemporary records created during the period |
excavation-report | ASI reports, field surveys, archaeological publications |
scholarly | Peer-reviewed academic books and papers by recognised historians |
reference | Dictionaries, encyclopaedias, atlases, chronological tables |
web | Wikipedia, blogs, online articles |
video | YouTube lectures, documentaries |
Sub-period within a dynasty. Omit if uncertain — a missing tag is better than a wrong one.
| Tag | Covers |
|---|---|
early-pallava | 275–575 CE — Simhavishnu and before |
imperial-pallava | 575–750 CE — Mahendravarman I to Narasimhavarman II |
late-pallava | 750–897 CE — Nandivarman II onwards |
early-chola | 300–850 CE |
imperial-chola | 850–1200 CE |
late-chola | 1200–1279 CE |
| Add equivalent period tags for other dynasties as their corpus is built | |
Subject matter. Multiple topic tags allowed and encouraged per source.
| Tag | Covers |
|---|---|
architecture | Temple design, structural features, building techniques |
inscription | Epigraphic records, stone and copper plate inscriptions |
sculpture | Iconography, relief panels, bronze casting |
painting | Cave paintings, murals, manuscript illustration |
numismatics | Coins, seals, medallions |
literature | Poetry, Puranas, Sangam literature |
history | Political history, genealogy, chronology, warfare |
religion | Shaivism, Vaishnavism, Jainism, Buddhism in context |
geography | Trade routes, ports, territorial extent |
language | Script, grammar, phonology, epigraphy methodology |
| Source | Tags (in order) |
|---|---|
| Minakshi — Administration and Social Life under the Pallavas | pallava scholarly imperial-pallava history |
| ASI Report — Mahabalipuram excavation | pallava excavation-report imperial-pallava architecture sculpture |
| Shore Temple copper plate inscription | pallava primary-source imperial-pallava inscription |
| Longhurst — Pallava Architecture | pallava scholarly architecture |
| YouTube — Kailasanathar Temple lecture | pallava video imperial-pallava architecture |
| Wikipedia — Pallava dynasty | pallava web history |
| Sastri — A History of South India | pallava chola chalukya scholarly history general-indian |
early-pallava not early pallava)POST /admin/patch-tags works but is tedious at scale| Tool Name | When Claude picks it | Wraps |
|---|---|---|
history_rag_query | "Who built Shore Temple?" · "What language did Pallavas use?" | rag/answerer.py:answer() |
history_research_paper | "Generate a paper on Pallava art" · "Summarise Pallava warfare" | rag/retriever.py:retrieve() + Claude synthesis |
history_semantic_search | "Show me passages about" · "Find sources on" · "Browse" | rag/retriever.py:retrieve() |
Seven architectural questions assessed against the current codebase (April 2026). Full analysis of feasibility, gaps, and recommended approach for each.
What works in your favour: The pluggable ingester pattern (BaseIngester → PdfIngester, DocxIngester, etc.) is clean and extensible. The RAG pipeline (rag/answerer.py + rag/retriever.py) consists of two separate, independently callable functions — a History Agent can wrap them without modification. Claude's native tool-use is the correct routing mechanism. The FastAPI router pattern (app/routers/) means adding an agent.py router is a natural extension.
The caveat — client coupling: app/dependencies.py returns a hardcoded anthropic.Anthropic() client. When the Master Agent needs to call sub-agents with different models (Haiku for fast routing, Opus for deep research), there is no way to configure this without touching multiple files. Fix: replace hardcoded model strings with os.environ.get("MODEL_SMART", "claude-opus-4-6") before wiring the agent loop.
The bigger risk is state: Background jobs (_jobs dict in admin.py) are in-memory. When a Master Agent starts a long-running research paper task, the result disappears on server restart. For the History Agent alone this is acceptable. Personal Agent (reminders, goals, Hindu calendar, Gmail across sessions) requires a persistent SQLite store from day one.
Summary: Flexible for History Agent now. Personal Agent needs SQLite state persistence first. Security Agent can run as a stateless audit loop.
Claude Code: Yes, immediately, no code changes needed. Claude Code's MCP tool interface can call any HTTP endpoint. You already have POST /agent/run planned. Write an MCP tool definition that calls this endpoint and Claude Code becomes a natural language front-end to your entire knowledge base today. This is the fastest path to a working agent interface.
Apple app (iPhone): Yes, but it is a separate project. Your FastAPI backend is REST-over-HTTP. Any iOS app can call it. The architecture is correct — app/main.py is already described as "the backend for a future iOS/Android app." What you need to build: a SwiftUI app with a chat-style UI, auth (a fixed API key in the app is sufficient for personal use), and the /agent/run endpoint on the backend. The iOS app does not change any backend design decisions. Build the backend first; the app is a client.
One practical issue with both: Long-running requests. Research paper generation takes 15–30 seconds. The fix is the same for both interfaces: convert long-running agent tasks to background jobs with polling (POST /agent/run → returns job_id → GET /agent/job/{id} polls for result). This pattern already exists in admin.py for PDF ingestion — apply it to the agent loop in the same way.
Issue 1 — Local file paths (most significant): source_library.py copies files to C:\Users\siddi\OneDrive\Personal\History\Pallavas\source_library\. On EC2 there is no OneDrive. Since this is already controlled by the PALLAVA_LIBRARY_DIR env var, this is a configuration change not a code change — but you need to plan where files live on EC2 (e.g. an EBS volume at /data/source_library) before migrating.
Issue 2 — ChromaDB is local (manageable): ChromaDB's PersistentClient writes to a local directory. On EC2 this directory must be on a persistent EBS volume, not the ephemeral root volume. Standard deployment practice — no code change required.
Issue 3 — docx2pdf is Windows-only (blocking): On Windows, docx2pdf uses Microsoft Word via COM automation. On Linux EC2 there is no Word. Replace with LibreOffice (libreoffice --headless --convert-to pdf) or Gotenberg. This is a 10-line code change but it will break on first deploy to Linux EC2 if not addressed.
Issue 4 — py -3 command: On EC2 (Linux), the Python command is python3, not py -3. Any shell scripts or subprocess calls using py -3 will fail. Minor — worth noting for deployment scripts.
Issue 5 — Unicode fonts for Word export (minor): The .docx research paper export uses Noto Serif to render IAST diacritical characters (ā, ī, ū, ṭ, ḍ etc.). On Windows this font must be installed manually. On EC2 (Ubuntu/Debian), run: sudo apt-get install fonts-noto-serif as part of the server setup script — one line, resolves all Unicode rendering issues in exported Word documents.
No structural issues. FastAPI + uvicorn is production-grade and runs identically on EC2. Tighten CORS (allow_origins=["*"]) before going public.
The numbers: 1,000 PDFs × ~100 chunks per PDF = ~100,000 chunks. 10,000 images × 1 description = 10,000 vectors. Total: ~110,000 vectors at 1024 dimensions. ChromaDB's HNSW index handles 10M+ vectors — 110K is trivial. At 100K vectors, HNSW queries return in <100ms locally. On EC2 with an EBS volume expect 200–400ms. Fully acceptable.
The real scaling risk is not ChromaDB — it is knowledge.json: knowledge_store.py loads the full knowledge.json into memory on every ingest operation. At 22.7 MB today (~5,000 chunks) this is fine. At 100,000 chunks this file will be ~450 MB and loading it on every ingest will be slow. The fix: stop using knowledge.json as a query store and query ChromaDB directly for everything. knowledge.json becomes an append-only audit log, not a lookup table. No re-ingestion needed — this is a code change to the ingest path only.
Multi-dynasty collection design — recommended approach: Rather than one collection per dynasty (chalukya_knowledge, chola_knowledge), use a single indian_knowledge collection with a dynasty metadata tag. All dynasties in one collection, filter by tag at query time. This makes cross-dynasty queries simple and natural — exactly what is needed for "cross-dynasty relationship details without hallucination." Migrate to this model when Chalukya ingestion begins — it is a reindex operation, not a code change.
| Scenario | Claude Vision | Amazon Textract | Google Document AI |
|---|---|---|---|
| Modern printed English/IAST PDFs | Excellent | Excellent | Excellent |
| Pre-2000 Tamil PDFs (TSCII/legacy encoding) | Fix #2 handles this in preprocessing | Poor — no legacy encoding awareness | Poor |
| Sanskrit/Tamil Unicode in printed books | Good | Limited | Good (dedicated Indic models) |
| Stone inscription photographs (epigraphic) | Excellent — contextual reasoning | Poor — pixel-level only | Fair |
| Grantha / Vatteluttu script | Unique advantage — reasoning about rare scripts | Cannot handle | Cannot handle |
| Cost per page | ~$0.003–0.015 (Haiku/Sonnet/Opus) | ~$0.0015 | ~$0.0015 |
| Structured extraction (tables, forms) | Good | Excellent | Excellent |
Why Claude wins for this project specifically: The critical advantage is contextual understanding. When Claude sees a damaged inscription photograph with partial characters, it can reason: "This is a Grantha script Pallava copper plate, therefore this partially visible character is likely..." — Textract and Document AI do pixel-level pattern matching, not contextual reasoning. For scholarly epigraphic work this is not a small difference — it is the entire problem domain.
Where competitors are cheaper: For structured documents (invoices, forms, tables), Textract's AnalyzeDocument API is faster and cheaper. For high-volume modern printed text, Google Document AI is cheaper at scale. Neither of these scenarios applies to the primary use case.
Recommendation: Keep Claude for OCR. The OCR evaluation suggested in the requirements is worth doing for modern Indic printed text where Google's dedicated Indic models might reduce cost at scale. For inscription photographs and rare scripts, Claude is in a category of its own.
ChromaDB API calls appear in four files. The APIs between providers are not compatible — Pinecone, Weaviate, Qdrant, pgvector all have different client libraries and query syntax.
| File | What uses ChromaDB | Lines |
|---|---|---|
knowledge_store.py | get_knowledge_collection(), upsert(), delete(), count() | ~30 |
rag/retriever.py | collection.query() — the most critical path | ~20 |
app/routers/gallery.py | get_gallery_collection(), upsert(), query() | ~25 |
vector_store.py | pallava_inscriptions collection management | ~30 |
Pragmatic approach (recommended): Before going live on EC2, create a vector_store_factory.py with a thin VectorStore wrapper class that encapsulates the 4 operations actually used (upsert, query, delete, count). All 4 files import from this factory. When migrating, change one file. This is 2–3 hours of refactoring that buys clean migration later.
Hosted alternatives for Phase 4: Qdrant Cloud — closest API to ChromaDB, easiest migration. pgvector (PostgreSQL extension) — if one database for everything (agent state + vectors) is desired, compelling for EC2. Pinecone — managed, no EC2 required, but vendor lock-in and higher cost at scale.
| Bottleneck | When it matters | Fix |
|---|---|---|
| Synchronous blocking requests | Now — agent tasks take 15–60 seconds. With one uvicorn worker, no other request is served during this time. | Convert long-running agent tasks to background jobs with polling. Pattern already exists in admin.py. |
knowledge.json full load on every ingest |
Phase 2 — at ~50K+ chunks, loading 200+ MB into memory on every ingest becomes slow. | Remove knowledge.json as a runtime lookup; query ChromaDB directly. Keep .json as append-only audit log. |
| No session state for multi-turn conversations | Personal Agent from day one. History Agent is stateless — acceptable. Personal Agent (Hindu calendar reminders, Gmail, goals) is not. | SQLite sessions table — store messages[] per session_id, load on each /agent/run call. |
| ChromaDB single-writer constraint | Phase 3 — multi-user beta. Concurrent writes risk index corruption. | ChromaDB Server mode or migrate to hosted vector DB (Qdrant Cloud). Phase 3 concern only. |
Priority: Fix the synchronous blocking issue before launching the agent loop — a 60-second HTTP request will time out on mobile and feels broken. Background jobs + polling solves this and is already a proven pattern in the codebase.
"claude-opus-4-6" strings with MODEL_SMART / MODEL_FAST env vars. Full abstraction layer deferred to Phase 3.vector_store_factory.py wrapping the 4 operations used. Single file to change at migration.gallery.py and crawl_wisdomlib_gallery.py. All future features (POST /identify, video analysis) must be built on the shared foundation.
corpus/pdf_pages/{fingerprint}/ during first ingest. Eliminates all future re-ingest costs — pay Claude Vision once per PDF, never again.
dynasty tag. Enables natural cross-dynasty queries. Migrate when Chalukya ingestion begins — reindex only, no API cost.
_jobs pattern from admin.py is reused for the agent router.
"claude-opus-4-6" string literals with os.environ.get("MODEL_SMART", "claude-opus-4-6"). No abstraction layer — just configurable strings. Enables model swaps without code changes.
vector_store_factory.py for the 4 ChromaDB operations used across the codebase. All files import from the factory. Enables clean migration to Qdrant, pgvector, or Pinecone by changing one file.
| Agent | Project | Visibility | Calls |
|---|---|---|---|
| Agent J (Master) | C:/Personal/AI Project/agent-j/ |
Personal only | History Agent + Personal Agent via HTTP |
| History Agent | C:/Personal/AI Project/pallava-translator/ |
Public-facing (Phase 4) | No access to Personal Agent or Agent J internals |
| Personal Agent | C:/Personal/AI Project/personal-agent/ |
Private only — never exposed | Can call History Agent via Master. Private corpus stays isolated. |
| Security Agent | TBD (Phase 4) | Internal only | Stateless audit loop across all agents |
| Interface | Phase | How | Cost |
|---|---|---|---|
| Claude Code (MCP) | Phase 1 — Now | MCP tool definition calls POST /agent/run on local machine. Natural language in Claude Code → Agent J routes to sub-agents. |
$0 |
Web UI (/agent-ui) |
Phase 1 — Now | Simple chat-style text input → fetch → rendered markdown. Built alongside Agent J backend. | $0 |
| Mobile web app | Phase 3 — Post AWS | Responsive web UI hosted on AWS EC2. Works in Safari/Chrome on iPhone. Can be added to Home Screen — looks and feels like a native app. Same backend, no App Store required. | Included in EC2 cost (~$35/month) |
| Native iPhone app (SwiftUI) | Phase 4+ — Optional | Deferred indefinitely. Mobile web app covers the use case at zero additional cost. Native app only if specific iOS features (push notifications, offline mode) are needed. | $99/year Apple Developer + build effort. Deferred. |
| Step | Detail | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. Develop locally | Write and test on Antigravity at localhost:8000. All three agents run via docker compose up in agent-j/. |
||||||||||||||||||||
| 2. Push to GitHub | git push origin main — private repo. Corpus data (PDFs, ChromaDB, SQLite) never goes to GitHub — stays on local disk and EBS volume. |
||||||||||||||||||||
| 3. Install system dependencies | sudo apt-get install fonts-noto-serif libreoffice — Noto Serif for Unicode IAST characters in Word exports; LibreOffice as docx2pdf replacement for PDF conversion on Linux. |
||||||||||||||||||||
| 4. Copy corpus data to EBS |
Corpus files are excluded from git (too large). What to copy and what to skip:
⚠ Known risks with corpus copy:
|
||||||||||||||||||||
| 5. Deploy to EC2 | SSH into EC2 → git clone https://github.com/Sid00009/Project_OM.git && docker compose up -d --build. Later: automate with GitHub Actions (optional). |
| Resource | Spec | Cost | Notes |
|---|---|---|---|
| EC2 | t3.medium (2 vCPU, 4GB RAM) | ~$30/month | Upgrade to t3.large if embedding gets slow. 2 min, no downtime. |
| EBS Volume | 50GB gp3 | ~$5/month | Stores corpus, ChromaDB, SQLite, uploaded files. Persists across EC2 restarts. |
| Route 53 (optional) | Custom domain | ~$0.50/month | e.g. agent-j.yourdomain.com. Not required for personal use. |
| Total Phase 3 | — | ~$35-40/month | Covers all three agents + mobile web app + corpus storage. |
siddijagadeesh+aws@gmail.com. Enable MFA on root account immediately after creation.
The same os.environ.get() pattern is used in all three agents — credentials are never hardcoded.
The source of those environment variables changes per environment, but the application code does not.
| Environment | Where secrets live | How app reads them | Notes |
|---|---|---|---|
| Local (now) | .env file in each project root — loaded by python-dotenv at startup |
os.environ.get("KEY") — same in all app/config.py files |
.env is in .gitignore — never committed to GitHub |
| AWS EC2 (Phase 3) | AWS Systems Manager Parameter Store (free tier) or Secrets Manager (~$0.40/secret/month) | Startup script exports parameters as env vars before uvicorn starts. App code unchanged. | No .env file on server — AWS injects values at boot |
| Agent | Secret | Local | AWS |
|---|---|---|---|
| All agents | ANTHROPIC_API_KEY |
.env file |
Parameter Store → env var |
| Personal Agent | GMAIL_CLIENT_ID, GMAIL_CLIENT_SECRET |
personal-agent/.env |
Parameter Store → env var |
| Personal Agent | Gmail refresh token (auth/gmail_token.json) |
Local file — in .gitignore |
EBS volume mounted at /app/auth/ — persists across restarts |
| Agent J | SQLite memory DB (memory/agent_j.db) |
Local file | EBS volume mounted at /app/memory/ |
| Step | Detail |
|---|---|
| 1. Google Cloud project | Create project agent-j-personal at console.cloud.google.com |
| 2. Enable Gmail API | APIs & Services → Library → Gmail API → Enable |
| 3. OAuth consent screen | External · App name: Agent J · Support email: your Gmail · Add yourself as test user |
| 4. Create OAuth credentials | Credentials → Create OAuth 2.0 Client ID → Web application · Redirect URI: http://localhost:8002/auth/gmail/callback |
| 5. Save credentials | Copy Client ID + Client Secret into personal-agent/.env |
| 6. Authorise | Start Personal Agent → visit http://localhost:8002/auth/gmail/start → grant access → token saved to auth/gmail_token.json |
| 7. On AWS (Phase 3) | Complete OAuth once locally → copy gmail_token.json to EBS volume on EC2 → token auto-refreshes, no re-authorisation needed |
.env or gmail_token.json to GitHub — both are in .gitignore.
If either is accidentally exposed, rotate immediately: Anthropic dashboard for API keys, Google Cloud Console → revoke token for Gmail.
On AWS, the EC2 instance uses an IAM role to read Parameter Store — no credentials are stored in EC2 config files.
http://localhost:8002/auth/gmail/start → grant access again.
The same applies if you revoke access manually in Google Account → Security → Third-party access,
or if the token has been unused for 6 months.
ANTHROPIC_API_KEY is stored in plain text inside start_all.bat for convenience.
This file is not committed to GitHub, but the key will be visible if you share your screen, share the file,
or if someone accesses your laptop. Before any screen share or demo session, close the file in your editor
and do not open it. If the key is ever exposed, rotate it immediately at
console.anthropic.com → API Keys.
| Phase | Scope | Agent J Status | Infrastructure |
|---|---|---|---|
| Phase 1 Now · Local |
Agent J skeleton. History Agent wired as first tool. Claude Code MCP interface. Basic web UI. Docker-ready layout from day one. | Master Agent + History Agent only | Local Windows (Antigravity). All paths via env vars — AWS-portable. |
| Phase 2 70% corpus · Local |
Full Pallava + Chalukya corpus. Personal Agent live locally. Gmail integration. VectorStoreFactory abstraction. | History + Personal Agent live. Security Agent design begins. | Still local. Docker compose runs all three agents. |
| Phase 3 90% corpus · AWS |
Migrate to EC2. Mobile web app live. Select user beta for History Agent. Multi-dynasty queries. docx2pdf → LibreOffice. |
All agents live on EC2. Mobile web app as primary interface. | EC2 t3.medium + EBS 50GB. ~$35–40/month. |
| Phase 4 100% · Public |
History Agent public. Full Security + Doc Review Agent. Copyright-resolved images. Full delivery pipeline. | All agents live. Security Agent audit loop running. | EC2 t3.large+. Hosted vector DB. Auth layer. CDN. ~$80–120/month. |
| Phase 5 Post-launch |
Automated knowledge validation loop. Query Agent + Review Agent. 50-question test set. Self-improving corpus. | Validation loop running. Human sign-off required per round. | Same EC2. No additional infrastructure. |
Once the History Agent and multi-agent architecture are complete and the corpus is fully built, Phase 5 introduces a self-improving validation loop. Rather than manually testing whether the system answers questions correctly, a dedicated Query Agent and Review Agent work together to surface gaps, hallucinations, and weak citations — automatically. This phase is not about adding new data. It is about verifying that what is already in the corpus is being retrieved, synthesised, and cited correctly.
| Agent | Role |
|---|---|
| Query Agent | Fires a curated test set of natural language questions across all dynasties, inscription types, topics, and time periods |
| Multi-Agent System | Receives each question, retrieves relevant chunks, generates a cited answer via the History Agent |
| Review Agent | Independently evaluates each answer against the corpus — checks citation accuracy, factual consistency, and hallucination. Has no access to the generated answer during retrieval — evaluates output only |
| Human (you) | Reviews the final consolidated report, provides sign-off or targeted feedback to continue |
Minimum 50 questions curated by you — the quality of the test set determines the quality of the validation. Questions must span 5 categories:
| # | Category | Example question |
|---|---|---|
| 1 | Dynasty-specific | "Who were the Pallava kings during the Imperial period and what did each build?" |
| 2 | Inscription-specific | "What does the Velurpalaiyam copper plate grant say about land ownership?" |
| 3 | Cross-dynasty | "How did Pallava temple architecture influence the Early Chola style?" |
| 4 | Corpus gap probe | "What is known about Pallava naval activity?" (intentionally thin corpus area) |
| 5 | Contradiction probe | "What date is assigned to the Shore Temple construction?" (tests conflict surfacing) |
| Step | What happens |
|---|---|
| Round 1 | Query Agent fires all N questions → Multi-Agent System answers each → Review Agent evaluates all answers → produces per-question scorecard (pass / flag / fail + reason) |
| Between rounds | You review flagged/failed questions. Option A: fill corpus gaps (ingest missing sources). Option B: refine system prompt / RAG retrieval rules. Option C: accept known limitation (mark as out-of-scope) |
| Rounds 2–3 | Re-run only flagged/failed questions from prior round → Review Agent re-evaluates |
| Exit condition | ✅ Review Agent passes ≥90% of test questions, OR ✅ max 3 rounds reached → force exit with gap report, OR ✅ you provide final sign-off after reviewing consolidated report |
| Check | Pass condition |
|---|---|
| Citation present | Every factual claim has ≥1 corpus chunk cited |
| Citation accurate | The cited chunk actually supports the claim made |
| No hallucination | No king names, dates, or temple attributions absent from corpus |
| No bias | Answer is factually grounded — not editorially skewed, not missionary or western-biased narrative |
At the end of Phase 5, regardless of how many rounds ran, the system produces a consolidated report covering:
| Document | Contents |
|---|---|
| Validation report | Per-question pass/fail/flag with Review Agent reasoning |
| Gap inventory | Topics where corpus coverage was insufficient for a confident answer |
| Prompt improvement log | What was changed between rounds and why |
| Known limitations | Topics intentionally marked out-of-scope with justification |
| Sign-off record | Your final approval with date and notes — feeds Phase 6 / public launch preparation |
| Prerequisite | Status |
|---|---|
| Phase 3 complete — corpus ≥90%, star weights assigned | Phase 3 |
| Phase 4 complete — History Agent + multi-agent architecture stable | Phase 4 |
| 50-question test set curated by you (1–2 hours) | Manual |
| ChromaDB reindex completed with final corpus state | Pre-run |