Scholarly Documentation

Pallava Script Unicode Initiative

Systematic epigraphic evidence corpus supporting the official Unicode encoding of the Pallava script — building on Anshuman Pandey's 2018 proposal (L2/18-083) and contributing to the Unicode Technical Committee's active script work.

Compiled by Sidda Jagadeesh Donthi Siddappa · Independent Researcher

Five-Layer Encoding Architecture

Layer 1

Glyph Images

Primary epigraphic evidence. Stone inscription photographs and handwritten script images.

*.jpg / *.png

Layer 2

PUA Encoding

Internal stable codepoints. U+E800–U+E8FF block reserved for Pallava characters.

U+E820 = ka

Layer 3

Grantha Unicode

Interoperability proxy using the closest encoded ancestor script (U+11300–U+1137F).

𑌕 U+11315

Layer 4

IAST Transliteration

ISO 15919 romanisation. Scholarly standard and LLM training backbone.

ka, śrī, namas

Layer 5

Translation

Sanskrit and Telugu semantic translations generated via Claude Vision API.

Sanskrit / Telugu

Character Inventory & Evidence

PUA Code	IAST	Glyph	Grantha	Telugu	Devanagari	Evidence	Status	Notes
Loading character table…

Unicode Submission Checklist

🔤 Complete character inventory (vowels, consonants, diacritics, conjuncts) In Progress (~50 of ~100)

📜 Epigraphic evidence per character (minimum 3 attestations each) Pending — building corpus

📊 Character frequency analysis across corpus Pending — needs larger corpus

🔬 Comparative paleography (Pallava vs Grantha vs Brahmi) Pending

📚 Scholarly citations compiled (Pandey 2018, Lockwood 2015, others) Partial — 3 references

✍️ Formal Unicode proposal document drafted Pending

🌐 Submitted to Unicode Technical Committee (UTC) Pending

Future-Proof Migration Plan

When Pallava script receives official Unicode approval:

1. The Unicode Consortium will publish official codepoints for each Pallava character.
2. Update the unicode_official field in pallava_pua_chart.json for each character.
3. Run the migration script: py -3 migrate_to_official_unicode.py
4. All corpus entries, knowledge base chunks, and training data will be updated automatically.
5. PUA codepoints remain valid as aliases — no data is lost.

The five-layer architecture ensures zero data loss — IAST and glyph images are encoding-independent and remain valid regardless of which Unicode codepoints are eventually assigned.

Scholarly References

Proposal to Encode the Pallava Script in Unicode

Anshuman Pandey · 2018 · Document L2/18-083 · unicode.org

The Creation of the Pallava Grantha Tamil Script

Michael Lockwood · 2015 · Primary source — ingested into corpus

SEI Liaison Report — Script Encoding Initiative (Pallava active)

Unicode Technical Committee · January 2025 · Document L2/25-014 · unicode.org

Grantha Unicode Block (U+11300–U+1137F)

Unicode Standard 7.0+ · Used as proxy encoding for Pallava characters · Wikipedia