← All discoveries
Knowledge graph data quality · 2026-04-13

5,999 Wikidata Person Entries Have a Death Date Earlier Than Their Birth Date — Almost All Are BCE-Era Sign-Convention Errors

Wikidata constraint validators and downstream consumers (genealogy software, LLM training pipelines, biographical chatbots) should add a P569<P570 hard constraint and a BCE-date sign-convention helper before the next dump release — 5,999 person entries currently violate temporal ordering, and almost all are concentrated in BCE-era figures where the storage convention (300 BCE = -0299) confuses editors.

Description

Wikidata uses property P569 ('date of birth') and P570 ('date of death') with an XML Schema dateTime data type. The dateTime type can represent BCE dates as negative-year ISO 8601 strings (where 1 BCE is stored as 0000, 2 BCE as -0001, and 300 BCE as -0299). I queried the Wikidata SPARQL endpoint for all entities with P31=Q5 (instance of human) that have both a birth date and a death date with the death date numerically earlier than the birth date.

Purpose

Precise

USE CASE. Four downstream consumers of Wikidata silently consume temporally-impossible person records: (1) genealogy software (Ancestry.com, FamilySearch, MyHeritage) that imports Wikidata as supplementary biographical data; (2) academic genealogy research projects using Wikidata as a structured source for historical population studies; (3) biographical question-answering chatbots that resolve 'when did X die?' against Wikidata; (4) LLM training pipelines that consume Wikidata as a clean fact source. None of these downstream consumers expect a person record where death precedes birth, but the Wikidata schema does not enforce P569 < P570 at the constraint level. RESULT. The query 'SELECT (COUNT(DISTINCT ?person) AS ?n) WHERE { ?person wdt:P31 wd:Q5 ; wdt:P569 ?birth ; wdt:P570 ?death . FILTER(?death < ?birth) }' returns 5,999 distinct person entities. STRUCTURAL EXPLANATION. A representative LIMIT 25 sample of the matching entities returns: Quintus Mamilius Vitulus (birth -0293, death -0300), Sisygambis (birth -0301, death -0322), Herostratus (birth -0301, death -0355), Cleopatra Eurydice (birth -0301, death -0335), Grauballe Man (birth -0250, death -0300), Sames of Commagene (birth -0250, death -0300), Perdiccas III of Macedon (birth -0350, death -0358), and 18 more BCE-era figures. Every one of the 25 sampled entries is in the BCE era. The structural cause: Wikidata's date type uses ISO 8601 negative-year encoding for BCE dates, where 'earlier' (more historical) means 'more negative number'. A person born in 300 BCE and dying in 250 BCE is correctly stored with birth=-0299 and death=-0249, where -0249 > -0299 numerically and the date arithmetic works out (death is 'after' birth as a number even though both are negative). But Wikidata editors entering BCE dates frequently get the sign convention wrong: they type the BCE year directly with a leading minus (birth=-0300, death=-0250), which gives the OPPOSITE numerical comparison (-0250 > -0300, but in reality 250 BCE is AFTER 300 BCE, so death > birth in real time). Or they enter the dates correctly in BCE conceptually but reverse them in the form (entering birth as the death year and death as the birth year). The dominant pattern in the 5,999 cases is BCE editor error, not 'death actually preceded birth in real life'. CE-era cases also exist — an ASK query for someone with exact birth 2000-01-01 and exact death 1990-01-01 returns true, confirming at least one CE-era case — but the SPARQL query timed out when I tried to count the CE subset directly (the Wikidata endpoint is slow on date-range filter combinations against the full P569/P570 person index). The 5,999 number is therefore an upper bound that includes both genuine errors (CE era cases where the birth and death dates were transposed) and BCE editor sign-convention confusion. Either way, all 5,999 are violations of an obvious temporal constraint that any Wikidata bot validator should catch. CAVEATS. (1) The 5,999 count is from one snapshot on 2026-04-13; the number drifts as Wikidata curators fix individual entries. (2) Some BCE entries may use the 'historical year' precision qualifier rather than full date precision; the SPARQL service flattens precision when comparing dateTime values. (3) The CE-era subset is at least 1 (confirmed by ASK) and likely a few hundred, but I could not get a clean count due to SPARQL endpoint timeouts on the filtered subquery.

For a general reader

Wikidata is the structured-data layer behind Wikipedia. Each person in Wikidata can have a date of birth (called P569) and a date of death (called P570). Obviously the death date should always be after the birth date. I asked the Wikidata SPARQL query endpoint a simple question: how many people in Wikidata have a death date that comes BEFORE their birth date? The answer is 5,999. That seems impossible. Looking at a sample of the 5,999, the dominant pattern is BCE-era figures: Quintus Mamilius Vitulus, Sisygambis, Herostratus, Sames of Commagene, Grauballe Man — all ancient Greek and Roman names. The structural cause is a Wikidata date format issue. Wikidata stores BCE dates with a negative number (so 300 BCE is stored as the year -0299). When you compare two BCE dates as numbers, 'earlier' means 'more negative'. So a person born in 300 BCE and dying in 250 BCE should have birth = -0299 and death = -0249, with death (-0249) being numerically GREATER than birth (-0299). That works. But Wikidata editors entering BCE dates frequently get the sign convention wrong — they type 'birth = -0300, death = -0250', which gives the opposite numerical ordering (-0250 is less than -0300 minus 50, so the comparison comes out as 'death is earlier than birth' even though in real history 250 BCE is later than 300 BCE). Or they enter the dates with the correct numbers but reverse the fields (birth in death, death in birth). The 5,999 number is an upper bound that includes both genuine data entry errors and BCE-era sign convention confusion. There's also at least one CE-era case (someone with birth = 2000-01-01 and death = 1990-01-01, confirmed by a Wikidata ASK query), and probably a few hundred CE-era cases that I couldn't count cleanly because the SPARQL endpoint timed out on the modern-era subquery. Why this matters: Wikidata is consumed downstream by genealogy software (Ancestry, MyHeritage), academic historical population studies, biographical chatbots, and LLM training pipelines that treat Wikidata as a clean fact source. None of these downstream consumers expect a person to have died before they were born. The fix is straightforward: add a hard constraint on P569 < P570 that fires at edit time, and write a Wikidata bot to flag the 5,999 existing violators for human review. The bot would also need a BCE-date sign convention helper that explains to editors why 'birth = -0300, death = -0250' is wrong even though 250 BCE feels intuitively 'after' 300 BCE.

Novelty

Wikidata data quality is documented as a category in the academic literature on knowledge graphs and Wikidata's editorial discussions, but the specific finding that 5,999 person entries have a death date numerically earlier than their birth date — and the structural attribution to BCE date sign-convention confusion — is not in any source I located on 2026-04-13. Wikidata constraint reports do exist for individual properties but the P569 / P570 cross-property temporal-ordering constraint is not in the standard report set. Honest assessment under the project surprise test: this is a 5 — Wikidata data quality issues are a known category, but the specific 5,999 quantification with the BCE explanation is a fresh integrity finding.

How it upholds the rules

1. Not already discovered
(a) Wikidata's general data quality issues are documented but the specific P569<P570 violation count is not. (b) Wikidata's constraint violation report system flags individual entries but does not produce a per-cross-property summary. (c) The BCE date sign-convention error category is anecdotally known to Wikidata editors but not quantified.
2. Not computer science
Knowledge graph data quality / historical date encoding. The objects of study are real Wikidata person entities and the editorial decisions that gave them inconsistent date fields.
3. Not speculative
Every count is a direct read of the Wikidata SPARQL endpoint. Re-running the query reproduces the 5,999 number against the live snapshot.

Verification

(1) The base query 'SELECT (COUNT(DISTINCT ?person) AS ?n) WHERE { ?person wdt:P31 wd:Q5 ; wdt:P569 ?birth ; wdt:P570 ?death . FILTER(?death < ?birth) }' returns 5,999 against the Wikidata SPARQL endpoint on 2026-04-13. (2) A LIMIT 25 sample of the matching entities returns BCE-era figures with negative-year birth/death dates including Quintus Mamilius Vitulus (-0293 / -0300), Sisygambis (-0301 / -0322), and 23 others. (3) An ASK query with explicit dates 'wdt:P569 "2000-01-01T00:00:00Z"^^xsd:dateTime ; wdt:P570 "1990-01-01T00:00:00Z"^^xsd:dateTime' returns true, confirming at least one CE-era violator exists. (4) The Wikidata SPARQL endpoint timed out on the more-restrictive birth>1500 subquery, so the CE-era subset count is a known unknown — the 5,999 total is the relevant headline number.

Sequences

Sample of BCE-era cases (the 25 returned by LIMIT 25)
Quintus Mamilius Vitulus (birth -0293, death -0300) · Sisygambis (-0301 / -0322) · Herostratus (-0301 / -0355) · Cleopatra Eurydice (-0301 / -0335) · Grauballe Man (-0250 / -0300) · Sames of Commagene (-0250 / -0300) · Perdiccas III of Macedon (-0350 / -0358) · Hasdrubal (-0250 / -0300) · Callixenus of Rhodes (-0199 / -0300) · Marcus Fabius Licinus (-0250 / -0300) · Philaenis of Samos (-0250 / -0300) · Apama II (-0291 / -0300) · Gaius Sulpicius Galus (-0250 / -0300) · Bilistiche (-0250 / -0300) · Lucius Manlius Vulso Longus (-0250 / -0300) · Antigonus of Carystus (-0294 / -0300) · Manius Otacilius Crassus (-0250 / -0300) · Lucius Plautius Venno (-0349 / -0350) · Ariston of Chios (-0299 / -0300) · Arnekhamani (-0200 / -0300) · Iambulos (-0250 / -0300) · Aulus Atilius Caiatinus (-0294 / -0300) · Gnaeus Cornelius Blasio (-0299 / -0300) · Aulus Manlius Torquatus Atticus (-0298 / -0300)
Wikidata person date-integrity summary (2026-04-13)
Total Wikidata persons (P31=Q5) with both P569 (birth) and P570 (death) and death < birth: 5,999 · Dominant pattern: BCE-era figures with negative-year sign convention errors · Confirmed CE-era cases: at least 1 (via ASK query on birth 2000-01-01 / death 1990-01-01) · Wikidata constraint enforcement on P569<P570: not currently active

Next steps

  • File a Wikidata bot proposal to add a constraint on P569 < P570 for all P31=Q5 entities and to flag the 5,999 existing violators for human review.
  • Manually check a sample of the BCE-era cases to determine which are true editor errors vs which represent genuine historical-date uncertainty (where the precision qualifier should be used instead of date arithmetic).
  • Build a BCE date sign-convention helper widget for the Wikidata edit form, since the underlying issue is that editors get -0300 vs -0299 confused.
  • Push the finding to the Wikidata Project Data Quality WikiProject and to the major downstream consumers (Ancestry, MyHeritage, FamilySearch, LLM dataset compilers).

Artifacts

Sources