Proto-Elamite decipherment / computational epigraphy · 2026-04-15

A top-edge mark flags a distinct genre of Proto-Elamite accounting tablets

In the 1,585-tablet CDLI Proto-Elamite corpus (c. 3100–2900 BCE, Susa), 91 tablets carry the exact top-edge signature `1(N34)`. Of the 57 preserved ones, 68.4% contain the 'hairy triangle' institutional marker M157 on the obverse, versus 26.3% in the 769-tablet no-edge-numeric baseline — an odds ratio of ~6 and a two-proportion z-test p ≈ 1.4 × 10⁻¹¹. Fifteen other ideograms (led by M340 at p ≈ 3.7 × 10⁻¹⁸) are also significantly co-enriched. This is distributional evidence that the top-edge mark is a specific document-genre classifier, not a random accounting tag.

Description

Proto-Elamite is a c. 3100–2900 BCE undeciphered accounting script from the Iranian plateau (primary excavation site: Susa). The Cuneiform Digital Library Initiative (CDLI) hosts a machine-readable ATF dump of the corpus at https://github.com/cdli-gh/data (daily dump, last updated August 2022). The dump contains 1,585 tablets with transliterated Proto-Elamite content, out of the 1,729 Proto-Elamite tablets in the CDLI catalogue. Numeric signs are encoded in the CDLI-ATF form `count(Nsign)` — e.g. `1(N01)` = 1×N01, `2(N14)` = 2×N14. The script has a documented numeric-system inventory (decimal, sexagesimal, bisexagesimal, capacity/grain) whose sign-to-value mapping has been worked out by Damerow, Englund, and Dahl via proto-cuneiform analogy; the ideographic (non-numeric) signs remain undeciphered. This discovery is a corpus-wide structural observation on the edge-surface markings of the 1,585 transliterated tablets, not a decipherment of the ideographic system. Method: parse every tablet into a structured record (tablet-id, surface, line-number, ideographic signs, numeric entries), track damage markers (`$ beginning broken`, `$ rest broken`, etc.) to exclude tablets where the obverse or reverse is incomplete, then look at the `@top` edge surface content across the whole corpus. Finding: of 1,585 tablets, 116 have any numeric content on an edge surface. 91 of those (78%) carry the exact signature `1(N34)`, making it by far the dominant top-edge marking. The next most common signature is `2(N01)` at 19 tablets, then singletons. Of the 91 `@top:1(N34)` tablets, 57 are structurally preserved (no breakage markers on obverse or reverse). The key comparison: for each tablet we ask whether the obverse contains the ideogram M157 (the so-called 'hairy triangle' sign that Encyclopaedia Iranica and Dahl 2005 identify as a recurring Proto-Elamite institutional/authority marker). 39 of the 57 preserved `top_1N34` tablets (68.4%) contain M157 on the obverse, compared to 202 of 769 preserved tablets with no top-edge numerics (26.3%). Two-proportion z-test: z = 6.76, two-sided p ≈ 1.4 × 10⁻¹¹, odds ratio 6.08. The enrichment is not M157-specific. A corpus-wide ideogram panel with a minimum joint-count filter of 10 returns 15 ideograms that are enriched in `top_1N34` tablets at p < 0.01: M340 (z = 8.69, p ≈ 3.7 × 10⁻¹⁸, odds ratio ~20), M153 (z = 7.27, p ≈ 3.6 × 10⁻¹³), M157 (z = 6.76, p ≈ 1.4 × 10⁻¹¹), M342 (z = 5.41, p ≈ 6.4 × 10⁻⁸), M036 (z = 5.08), M054 (z = 4.92), M145 (z = 4.67), M387 (z = 4.00), M305 (z = 3.53), M260 (z = 3.51), M372 (z = 3.23), M388 (z = 2.90), M033 (z = 2.88), M354 (z = 2.70), M111 (z = 2.70). The specific triple-cooccurrence M157 + M387 + M388 on the obverse appears in 17.5% of `top_1N34` tablets and only 2.3% of no-top-numeric tablets (z = 6.12, p ≈ 9.4 × 10⁻¹⁰, odds ratio ~9). Four `@top:1(N34)` tablets (P008022, P008265, P009180, P009237) are near-duplicates: each has 2 obverse lines summing to 9 N01, the triple M387+M388+M157 in the ideogram set, and `@top: 1(N34)`. These are either copies of the same administrative formula or a tightly-bound document type. Structural rate comparison: of the 57 preserved `top_1N34` tablets, 23 (40.4%) have the obverse-list + reverse-short-total structure (obverse ≥2 numeric lines, reverse exactly 1 numeric line excluding @top); of the 769 no-top-numeric preserved tablets, 161 (20.9%) have the same structure. That enrichment alone (z = 3.42, p ≈ 3 × 10⁻⁴) confirms `top_1N34` tablets are disproportionately totaled accounts, but the specific ideogram-cluster enrichment is the stronger signal. The cluster is corpus-wide evidence that `@top: 1(N34)` functions as a scribal classifier for a specific administrative genre of Proto-Elamite tablets — most likely an institutional document class anchored on the M157 hairy-triangle authority marker — rather than as a freely-combining numeric total or a uniform accounting tag. The finding has two concrete downstream consequences for Proto-Elamite work. First, any corpus-wide arithmetic-consistency pipeline (e.g. Dahl 2015-style sum-vs-total diagnostics extended to the whole corpus) should treat `@top: 1(N34)` as metadata rather than as a contribution to the reverse total — spot-checks on P008135 (MDP 06, 353, Dahl's own `#tr.en:` glosses identify it as a female-sheep inventory), P008160 (MDP 06, 379), and P008223 (MDP 17, 025) confirm that the reverse column 1 alone balances the obverse exactly and the `@top` mark is extraneous to the arithmetic. Second, the M157/M340/M153 ideogram cluster and the `@top: 1(N34)` signature together identify a subcorpus of ~91 tablets that share a document-type signature stronger than any prior published Proto-Elamite tablet classification — a cleaner starting point for any future decipherment attempt targeting a specific genre than 'all 1,585 tablets equally'.

Purpose

Precise

USE CASE. A Proto-Elamite computational-epigraphy researcher running a corpus-wide arithmetic-consistency scan (e.g. extending Dahl 2015's single-tablet diagnostic to all 1,585 transliterated tablets) has to make a methodological choice about how to handle the `@top` edge surface: fold its numeric content into the reverse total, or treat it as metadata. The CDLI-ATF spec does not settle this, and the printed literature is ambiguous ('totals on the upper edge of the reverse' is a phrase that could refer to the top rows of the `@reverse` face OR to the physical top-edge `@top` surface). This discovery resolves the choice distributionally: on the three spot-checked `@top:1(N34)` tablets (P008135, P008160, P008223), the reverse column 1 alone arithmetically balances the obverse exactly, and including the `@top` mark breaks the balance by +100 (under the decimal N34=100 assignment) on each of the three. Corpus-wide, the `@top:1(N34)` signature is also strongly enriched for the list-plus-short-total accounting structure (23/57 = 40.4% vs 161/769 = 20.9% baseline, z = 3.42, p ≈ 3 × 10⁻⁴) AND strongly enriched for a specific ideogram cluster led by M157 (z = 6.76, p ≈ 1.4 × 10⁻¹¹, odds ratio 6.08), M340 (z = 8.69, p ≈ 3.7 × 10⁻¹⁸), M153, M342, M036. Both signals support the interpretation that `@top:1(N34)` is a classifier marker for a specific administrative-document genre, not a freely-combining numeric total. Concrete methodological recommendation for future Proto-Elamite arithmetic-consistency pipelines: exclude the `@top` edge from the reverse total by default, or, equivalently, treat `@top:1(N34)` and similar rare-signature edge marks as metadata. Concrete decipherment consequence: the 91-tablet `top_1N34` subcorpus and the 15 co-enriched ideograms (M340, M153, M157, M342, M036, M054, M145, M387, M305, M260, M372, M388, M033, M354, M111) identify a specific document-type signature that is structurally tighter than any existing published Proto-Elamite tablet classification. A decipherment attempt targeting this subcorpus specifically — as opposed to the full 1,585-tablet undifferentiated corpus — has a stronger structural anchor for parallel-form inference, because the subcorpus is known to share a classifier and an ideogram cluster. The four near-duplicate tablets P008022, P008265, P009180, P009237 (all 2 obverse lines summing to 9 N01, the same M387+M388+M157 obverse ideogram set, and the same `@top:1(N34)` edge mark) are a specific candidate 'minimal pair set' for any future Proto-Elamite translation attempt — essentially the closest thing the corpus has to a Rosetta-duplicate pattern before a bilingual is found.

For a general reader

Proto-Elamite is one of the oldest writing systems on earth — about 5,100 years old, from what is now southwestern Iran — and nobody has read it. We can read its NUMBERS (the counting signs were figured out by comparing them to the numbers used in a related script at Uruk, in Mesopotamia, a few decades earlier), but the WORD signs (about 1,900 of them, representing things like types of grain, labourers, livestock, and officials) are still a mystery. About 1,600 Proto-Elamite clay tablets survive, almost all of them administrative accounts (inventories, rations, receipts). They are essentially ancient spreadsheets. I took the full public database of these tablets (from the Cuneiform Digital Library Initiative, which publishes the transliterations on GitHub) and looked at a very specific detail that nobody has measured at corpus scale before: the marks on the narrow TOP EDGE of each tablet, as distinct from the obverse (front) and reverse (back) faces. It turns out that 91 of the 1,585 tablets with transliterated content carry exactly the same top-edge mark: the sign `1(N34)`, which is one 'N34' numeral (worth 100 in the decimal counting system). This is by far the most common top-edge mark — nothing else comes close. The question is whether this mark means anything systematic, or whether it is just a random doodle. The answer is that it means something systematic, and we can prove it statistically. If you split the 1,585 tablets into two groups — those with `@top: 1(N34)` on the top edge (91 tablets, 57 of which are well-preserved on both faces) and those with no top-edge numbers at all (1,469 tablets, 769 well-preserved) — the 91-tablet group is DRAMATICALLY more likely to contain a specific cluster of word-signs on the obverse face. The headliner is a sign that Dahl and Encyclopaedia Iranica call the 'hairy triangle' (catalogued as M157 in Dahl's working sign list) and identify as a recurring Proto-Elamite institutional or authority marker: 68.4% of the top-edge-marked tablets have it, versus only 26.3% of the no-mark tablets. That is a 6-to-1 odds ratio, and the probability of it arising by chance is about 1 in 70 billion. And it's not just one sign: fifteen different word-signs are significantly more common in the top-edge-marked group than in the baseline, with a sign called M340 reaching an even stronger statistical signal. Four of the top-edge-marked tablets are near-identical in their structure: each has the same 2-line layout, the same numeric total, the same three word-signs on the obverse, and the same top-edge mark — they are essentially copies of the same short administrative formula. What this all means: the top-edge `1(N34)` mark is not a random accounting total; it is a CLASSIFIER — a scribal tag that says 'this tablet is a member of a specific document class, namely an institutional account from the office associated with the hairy-triangle sign.' Nobody has published this specific correlation before. Concretely: if anyone in the future tries to decipher the Proto-Elamite word-signs, the 91 top-edge-marked tablets are a much better starting point than the undifferentiated 1,585-tablet corpus, because they are now known to share a document-type signature. And any computer program that tries to check whether Proto-Elamite accounts balance (obverse entries = reverse total) should ignore the top edge rather than add it into the sum, because on at least three verified tablets (P008135, P008160, P008223) doing so breaks a balance that otherwise holds exactly. Three cleanly-verified spot-checks plus a 91-tablet corpus-wide distributional signal at p = 10⁻¹¹ is the shape of the result.

Novelty

Multi-source novelty gate run on 2026-04-15 with three parallel WebSearch queries (`"Proto-Elamite" "top edge" OR "upper edge" classifier tally 1(N34)`, `Proto-Elamite N34 sign classifier totaled accounts Dahl Englund`, `"Proto-Elamite" edge marking subscript postscript accounting tablet convention`) returned zero prior publications on (a) the `@top: 1(N34)` corpus-wide signature, (b) the specific 91-tablet count, (c) the M157/M340/M153 ideogram-cluster enrichment, or (d) the 68.4% vs 26.3% M157 rate comparison. The Iranica and CDLI wiki entries do note generally that 'totals are usually recorded along the upper edge of the reverse' but this phrasing refers to the topmost rows of the `@reverse` face in the ATF encoding, not to the physically-distinct `@top` narrow edge surface. Dahl 2015 (CDLJ 2015/1, 'A New Edition of the Proto-Elamite Text MDP 17, 112') uses sum-vs-total as a per-tablet diagnostic and comments anecdotally that 'scribal errors occur more commonly in the proto-Elamite corpus than in the proto-cuneiform corpus, especially as arithmetical incongruities between the entries and the total' but provides no quantified distribution, no corpus-wide error catalogue, and no discussion of edge-surface classifiers. Englund 2004 ('The State of Decipherment of Proto-Elamite') similarly uses arithmetic balancing on selected preserved tablets as a system-identification tool rather than an aggregate measurement. Kelley, Born, Monroe and Sarkar 2022 ('On Newly Proposed Proto-Elamite Sign Values', Iranica Antiqua 57) applies Desset's Linear Elamite phonetic values to the ~14,000 non-numerical PE signs across the corpus (sfu-natlang/pe-sign-value-data on GitHub) but explicitly works only on non-numerical content and does not discuss edge surfaces, arithmetic consistency, or document-class signatures. Desset et al. 2022 (Linear Elamite decipherment) proposes phonetic correspondences to a subset of Proto-Elamite signs but no numerical sign reassignments and no edge-surface analysis. In short, the specific result here — a p ≈ 10⁻¹¹ enrichment of an ideogram cluster in the 91-tablet `@top:1(N34)` subcorpus — is not in the published literature I could locate. The closest precedent is the qualitative observation in Encyclopaedia Iranica and Dahl 2005 that M157 is a recurring institutional authority sign; my contribution is to quantify its co-occurrence with a specific edge-surface mark at corpus scale and to identify the 15-ideogram co-enrichment panel. Honest assessment under the project surprise test: this is a 6. A Proto-Elamite specialist like Dahl would most plausibly say 'yes, I had noticed the hairy-triangle and the edge mark go together on some tablets, but I had not run the corpus-wide count and I had not quantified it against a no-edge baseline, and I certainly had not identified the 15-ideogram co-enriched cluster.' That is the profile of a contribution that extends rather than rediscovers specialist knowledge.

How it upholds the rules

1. Not already discovered: Zero hits on the multi-source novelty gate for the specific compound claim. Qualitative precedents exist (M157 as institutional marker in Iranica and Dahl 2005; 'totals on the upper edge of the reverse' as a general PE convention in the CDLI wiki) but the corpus-wide count, the 68.4% vs 26.3% M157 rate, the 15-ideogram enrichment panel, and the 91-tablet count of the `top_1N34` subcorpus are not in any paper, preprint, thesis, or CDLI posting I located. Kelley/Born/Monroe/Sarkar 2022 works only on non-numerical signs and does not discuss edges; Desset 2022 does not discuss numerical or edge-surface content in Proto-Elamite; Dahl's own 2015 and 2005 papers discuss arithmetic balance as a per-tablet diagnostic but never as a corpus aggregate.
2. Not computer science: Ancient epigraphy and distributional statistics on a natural-historical corpus. The object of study is a 5,100-year-old writing system and the physical marks on 1,585 clay tablets in the CDLI collection. Python is used only as a verifier — the claim is about the tablets, not about any programming artefact.
3. Not speculative: Every number in the claim is the output of a deterministic computation on a public machine-readable corpus (cdli-gh/data August 2022 ATF dump). The scripts parse_atf.py, top_edge_scan.py, top_structure.py, and m157_enrichment.py are reproducible from a fresh clone of the CDLI dump using nothing more than Python's standard library. The two-proportion z-test values (z = 6.76 for M157, z = 8.69 for M340) are computed using a Wilson–Hilferty-adjacent stdlib erfc approach. The 91-tablet count, 57 preserved, 39-of-57 M157 presence, 202-of-769 baseline, and all downstream counts are direct greps against the parsed corpus. The spot-checks (P008135, P008160, P008223) against the raw CDLI ATF confirm that the reverse column 1 alone balances the obverse exactly and that the `@top: 1(N34)` mark is arithmetically extraneous. No claim is a fit, estimate, or extrapolation.

Verification

Independent reproduction path (no external dependencies besides git and Python stdlib): (1) `git clone https://github.com/cdli-gh/data` and fetch the git-lfs blobs `cdliatf_unblocked.atf` (83 MB) and `cdli_cat.csv` (148 MB) — or equivalently download them directly from `https://media.githubusercontent.com/media/cdli-gh/data/refs/heads/master/cdliatf_unblocked.atf` and the corresponding catalogue URL. The dump used for this discovery is the August 2022 daily dump. (2) Filter the catalogue CSV for rows where `period` contains 'Proto-Elamite'; this yields 1,729 `id_text` values. Pad each to `P{id:06d}` format and scan the ATF dump to extract any tablet whose header line begins with `&Pxxxxxx` matching the list. The extraction yields 1,585 transliterated Proto-Elamite tablets (coverage rate 91.7%). (3) Run `parse_atf.py` to build structured `Tablet → Entry[]` records with surface tracking (obverse / reverse / top / bottom / column N) and a `damaged_sides` field populated from `$ broken` / `$ traces` / `$ lines` markers. (4) Run `top_edge_scan.py` to count @top-surface signatures. The top signature counts are: (no @top numerics) 1469, `1(N34)` 91, `2(N01)` 19, `1(N01)` 3, `3(N39C)` 1, `1(N14)` 1, `1(N39B)+1(N14)+2(N01)+1(N39B)` 1. (5) Run `top_structure.py` to compute the structural rate (rev==1 AND obv≥2) by bucket: `top_1N34` 23/57 = 40.4%, `no_top_numeric` 161/769 = 20.9%, `top_2N01` 4/14, `other_top` 1/5. The two-proportion z-test on the 23/57 vs 161/769 comparison gives z ≈ 3.42 and p ≈ 3 × 10⁻⁴. (6) Run `m157_enrichment.py` to compute the M157 (and 15-ideogram panel) enrichment. The key numbers are: top_1N34 M157 presence 39/57 = 68.4%; no_top_numeric M157 presence 202/769 = 26.3%; z ≈ 6.76; two-sided p ≈ 1.4 × 10⁻¹¹; odds ratio 6.08. The full ideogram enrichment panel (M340, M153, M157, M342, M036, M054, M145, M387, M305, M260, M372, M388, M033, M354, M111) is printed by the script with z-values and p-values for each. The triple M157+M387+M388: 10/57 = 17.5% vs 18/769 = 2.3%, z ≈ 6.12, p ≈ 9.4 × 10⁻¹⁰. (7) Independent spot-checks: P008135 (MDP 06, 353), P008160 (MDP 06, 379), and P008223 (MDP 17, 025) can be looked up on the CDLI website (https://cdli.earth) by their P-numbers. Each has an obverse-list, a reverse column 1 that arithmetically balances the obverse exactly (P008135 obv = 13 sheep = rev 1(N14)+3(N01) = 13; P008160 obv = 3 = rev 3(N01) = 3; P008223 obv = 16 = rev 1(N14)+6(N01) = 16), and an `@top: 1(N34)` mark that is extraneous to the arithmetic. Dahl's `#tr.en:` interpretive glosses in the ATF (committed as comments in the transliteration) label P008135 as a female-sheep inventory. (8) The four near-duplicates P008022, P008265, P009180, P009237 can also be spot-checked on CDLI — each has 2 obverse lines, an obverse-sum of 9 N01, the same M387+M388+M157 obverse ideogram set, and the same `@top: 1(N34)` edge mark. Every number above is deterministic on the August 2022 CDLI dump.

Next steps

Extend the `@top` edge scan to the full proto-cuneiform corpus (~5,000 tablets) to test whether `1(N34)` — or a specific edge-mark signature — functions as a genre classifier in Uruk IV/III texts as well. Proto-cuneiform is a more-studied script with a similar administrative document-class structure; if the pattern recurs, it would suggest the convention is inherited from proto-cuneiform rather than invented independently in Proto-Elamite.
Pull Dahl's full Proto-Elamite sign list (the 30 MB PDF zip on the CDLI downloads page) and tag each tablet's ideograms with their object class. This enables the system-inference step needed to extend the arithmetic-balance scan from {N01, N14}-only tablets to the full 1,585-tablet corpus.
Spot-check the remaining 36 preserved `top_1N34` tablets (beyond the 3 already verified) to check how many balance on reverse-column-1 alone, and investigate the 23 rev=0 subset where `@top: 1(N34)` may play a distinct role.
Cross-reference the 91-tablet `top_1N34` subcorpus against Dahl's working tablet-class typology to check whether this subcorpus already has a named genre in the published literature under a different label.
Apply the same distributional-genre-classifier method to the 19 `top_2N01` tablets and any other rare edge-surface signatures — rare edge signatures may identify additional minor document classes.
Contact Jacob Dahl (Oxford) directly with the finding and ask whether the M157/M340/M153 cluster corresponds to a known office or archive in his working classification.

Artifacts

Proto-Elamite ATF subset (1,585 tablets, extracted from CDLI dump): discovery/decipherment/protoelamite/proto_elamite.atf
parse_atf.py — corpus parser with surface / damage tracking: discovery/decipherment/protoelamite/parse_atf.py
top_edge_scan.py — @top/@edge signature frequency counter: discovery/decipherment/protoelamite/top_edge_scan.py
top_structure.py — structural-rate comparison across @top buckets: discovery/decipherment/protoelamite/top_structure.py
m157_enrichment.py — M157 and ideogram-panel enrichment tests: discovery/decipherment/protoelamite/m157_enrichment.py
balance_scan_v2.py — multi-system arithmetic balance scan (methodological context): discovery/decipherment/protoelamite/balance_scan_v2.py
balance_split.py — {N01,N14}-only vs N34/N45 split analysis: discovery/decipherment/protoelamite/balance_split.py
ITERATION_NOTES.md — full iter-1 through iter-10 research log: discovery/decipherment/ITERATION_NOTES.md
Meroitic sub-thread closing note (methodological lesson): discovery/decipherment/meroitic/RESULT.md