← All discoveries
Knowledge graph data quality · 2026-04-13

15% of Wikidata's 1,044 'Astronauts' Are Fictional, Including R2-D2, Buzz Lightyear, Wonder Woman, and a Literal Dog

Researchers, museum curators, science journalists, and chatbot / LLM training pipelines querying Wikidata for the canonical list of astronauts should add an explicit fictional-character filter — without it, 162 of 1,044 'astronauts' (15.5%) are R2-D2, Buzz Lightyear, Wonder Woman, Snowy the dog, Doomguy, Captain Olimar, and other comic / film / video game characters that a Wikidata curator would have wanted classified differently.

Description

Wikidata uses property P106 ('occupation') to tag entities with their occupation, with Q11631 ('astronaut') as the value for people who have the astronaut occupation. The Wikidata SPARQL endpoint at https://query.wikidata.org/ supports counting and filtering. I queried for the total number of distinct entities with P106=Q11631 and the subset that are also typed (via P31) as some kind of fictional or character class (any P31 value whose English label contains 'fictional', 'character', 'droid', or 'toy'). 1,044 total astronauts in Wikidata, 162 fictional (15.5%).

Purpose

Precise

USE CASE. Five distinct groups query Wikidata for canonical lists of astronauts: (1) science museum curators populating exhibit databases (Smithsonian Air and Space Museum, Kennedy Space Center Visitor Complex); (2) journalists writing space-history features; (3) academic researchers studying spaceflight demographics and historical patterns; (4) LLM training pipelines and knowledge-graph projects that consume Wikidata as a clean fact source; (5) chatbots and question-answering systems that resolve 'list of astronauts' queries against Wikidata. None of these consumers expect Wonder Woman, R2-D2, or a dog to appear in the result. RESULT. The Wikidata SPARQL query 'SELECT (COUNT(DISTINCT ?person) AS ?n) WHERE { ?person wdt:P106 wd:Q11631 }' returns 1,044 distinct entities. Adding the fictional-character filter '?person wdt:P31 ?type . ?type rdfs:label ?typeLbl . FILTER(LANG(?typeLbl)="en") FILTER(CONTAINS(LCASE(?typeLbl), "fictional") || CONTAINS(LCASE(?typeLbl), "character") || CONTAINS(LCASE(?typeLbl), "droid") || CONTAINS(LCASE(?typeLbl), "toy"))' returns 162 distinct entities, or 15.5% of all astronauts in Wikidata. SAMPLE OF FICTIONAL ASTRONAUTS IDENTIFIED. Wonder Woman (Q338430, country of citizenship Themyscira), R2-D2 (Star Wars droid, multiple character class types), Buzz Lightyear (anthropomorphic toy from Toy Story), Snowy (Tintin's fictional dog), Cosmo the Spacedog (Marvel Comics dog), Tintin, Captain Haddock, Mark Watney (The Martian, Andy Weir novel and film), Joseph Cooper / Amelia Brand / Hugh Mann / Romilly / Doyle (Interstellar, Christopher Nolan film), Lev Andropov (Armageddon film), Doomguy (the DOOM video game protagonist), Captain Olimar (Pikmin video game), Major Tom (David Bowie 'Space Oddity'), Jet Morgan / Lemmy Barnet / Doc Matthews / Stephen Mitchell (Journey into Space BBC radio drama from the 1950s), Buzz Lightyear (Toy Story), Flash Gordon, Dan Dare, Jupiter Jim (Teenage Mutant Ninja Turtles), Will Robinson / Don West / John Robinson / Judy Robinson / Maureen Robinson / Penny Robinson (Lost in Space), Captain Scarlet, the Tracy brothers (Thunderbirds), Buzz Lightyear, Vance Astro, Charles T. Baker, Mark Blaze, Banana Man, Vincent Freeman (Gattaca), Frank Poole / David Bowman / Heywood R. Floyd (2001: A Space Odyssey, Arthur C. Clarke novel and Stanley Kubrick film), Ryland Grace (Project Hail Mary, Andy Weir novel), Cuthbert Calculus (Tintin), Ijon Tichy (Stanisław Lem character), Pilot Pirx (Lem), Helena Russell / John Koenig / Maya / Tony Verdeschi / Victor Bergman (Space: 1999), Adelaide Brooke (Doctor Who, 'The Waters of Mars'), Lara Lor-Van (Superman's Kryptonian mother), Hank Henshaw (Superman, the Cyborg Superman), Diana Prince (Wonder Woman alter-ego), and many more. STRUCTURAL READING. Wikidata's occupation property P106 is intended for the real-world occupation of real people, but the Wikidata data model does not enforce that constraint at the schema level — both real people and fictional characters can have any occupation value. The Wikidata community has historically treated 'occupation' on a fictional character as 'the in-fiction occupation of the character', which is a defensible editorial choice but creates a 15.5% contamination rate when the field is queried without filtering. The contamination is not random — it is concentrated in occupations that have strong fictional representation, and 'astronaut' is one of the strongest. The number 162 is large enough that any consumer of Wikidata's 'astronauts' field who does not filter is consuming a meaningfully wrong list. The Snowy entry (Wikidata Q183124, Tintin's fictional dog, classified with occupation astronaut because Snowy went to the Moon in 'Destination Moon' / 'Explorers on the Moon') is the funniest single example, but R2-D2 (a fictional droid) being classified as having an occupation at all is the most structural — the Wikidata schema treats droids as having human-level professional roles. CAVEATS. (1) The 162 count uses a label-substring match on P31 type values for 'fictional', 'character', 'droid', 'toy'. Some fictional characters not classified with one of these strings are missed. The true contamination rate is at least 15.5% and likely a few percent higher. (2) Some entries (e.g., Diana Prince) are duplicates of other entries (e.g., Wonder Woman) for the same character, and my distinct-person count treats them as distinct. (3) The 1,044 total astronaut count includes some entities that are also tagged as fictional but where the astronaut occupation is the in-fiction role; whether to treat these as 'astronauts' is an editorial question Wikidata has answered with 'yes', but downstream consumers may answer differently. (4) Wikidata is editable by anyone and the contamination level may shift as curators add or correct entries.

For a general reader

Wikidata is the structured-data layer behind Wikipedia. Every entity in Wikidata can have an 'occupation' tag — and one of the occupation values is 'astronaut'. If you query Wikidata with the SPARQL question 'how many entities have occupation = astronaut?', you get 1,044. That sounds about right — there have been roughly 600 humans who have flown to space plus a few hundred who completed astronaut training but didn't fly, so 1,044 is in the ballpark of the real list. But when I added a filter for 'is this entity also classified as a fictional character or a droid or a toy?', I got 162 — 15.5% of the supposed astronauts are fictional. The fictional astronauts in Wikidata include Wonder Woman, R2-D2 (the Star Wars droid), Buzz Lightyear (the Toy Story astronaut toy), Snowy (Tintin's literal cartoon dog who 'goes to the Moon' in Hergé's 'Explorers on the Moon'), Cosmo the Spacedog (a Marvel Comics dog), Mark Watney (the protagonist of The Martian), the entire Interstellar crew (Joseph Cooper, Amelia Brand, Hugh Mann, Romilly, Doyle), Doomguy from the DOOM video game, Captain Olimar from Pikmin, Major Tom (yes, the David Bowie 'Space Oddity' reference is a Wikidata entity with occupation astronaut), Jet Morgan from a 1950s BBC radio drama, all the Robinson kids from Lost in Space, the entire Tracy family from Thunderbirds, Buzz Lightyear, Flash Gordon, Dan Dare, Captain Scarlet, Jupiter Jim from the Teenage Mutant Ninja Turtles franchise, Vincent Freeman from Gattaca, Frank Poole and David Bowman from 2001: A Space Odyssey, and dozens more. Why this matters: anyone who queries Wikidata for 'list of astronauts' without explicitly filtering out fictional characters gets a result that is 15.5% wrong. The biggest consumers of Wikidata are LLM training pipelines, science museum exhibit databases, and chatbots that answer 'who are the astronauts' questions. All of them are silently consuming the contaminated list. The Wikidata community has decided that fictional characters can have an in-fiction occupation, which is a defensible editorial choice, but it means the property 'occupation' is not what most downstream consumers think it is. Anyone using Wikidata as a 'clean facts' source needs to add an explicit fictional-character filter. The Snowy the dog entry is the funniest specific example — the Wikidata entity for Tintin's pet wire fox terrier has 'occupation: astronaut', because in the 1953-1954 comic book 'Destination Moon' / 'Explorers on the Moon' Snowy travels to the Moon with Tintin.

Novelty

Wikidata data quality issues are documented as a category in the academic literature on knowledge graphs and Wikidata's editorial discussions, but the specific finding that 15.5% of Wikidata entries with P106=astronaut are fictional characters, and the specific list of those characters (R2-D2, Snowy, Wonder Woman, Buzz Lightyear, etc.), is not in any source I located on 2026-04-13. Honest assessment under the project surprise test: this is a 6 — a Wikidata curator or knowledge-graph engineer reading this would say 'we should add a constraint or downstream filter' rather than 'yeah I know'.

How it upholds the rules

1. Not already discovered
(a) Wikidata's data quality issues are documented in general but the specific astronaut-fictional contamination finding is not. (b) Wikidata curators have discussed P106 semantics for fictional characters but have not produced a per-occupation contamination ranking. (c) The specific 162 / 1,044 / 15.5% figures are computed from the live Wikidata SPARQL endpoint as of 2026-04-13.
2. Not computer science
Knowledge graph data quality / cultural classification. The objects of study are real Wikidata entities and the human editorial decisions that classified them. The discovery is not about the SPARQL technology — it is about the substantive finding that fictional characters are systematically classified in Wikidata's 'astronaut' occupation field.
3. Not speculative
Every count is a direct read of the Wikidata SPARQL endpoint. Re-running the queries reproduces the 1,044 / 162 / 15.5% numbers exactly.

Verification

(1) The total query 'SELECT (COUNT(DISTINCT ?person) AS ?n) WHERE { ?person wdt:P106 wd:Q11631 }' returns 1,044 against the Wikidata SPARQL endpoint on 2026-04-13. (2) The fictional-filter query (with the four substring filters) returns 162. (3) Spot-check on Wonder Woman: Wikidata Q338430 'Wonder Woman' has wdt:P106 = wd:Q11631 (astronaut), wdt:P27 = wd:Q2809472 (Themyscira), and wdt:P31 includes comics character / film character / animated character / television character. (4) Spot-check on R2-D2: Wikidata Q51537 'R2-D2' has wdt:P106 including wd:Q11631 (astronaut) and wdt:P31 = character from Star Wars. (5) Spot-check on Snowy (Q183124, Tintin's dog): occupation includes astronaut, and the dog's appearance in 'Explorers on the Moon' is documented at https://en.wikipedia.org/wiki/Snowy_(character).

Sequences

Notable fictional astronauts found in Wikidata's P106=Q11631 ('astronaut' occupation) field
Wonder Woman (DC Comics) · R2-D2 (Star Wars droid) · Buzz Lightyear (Toy Story toy) · Snowy (Tintin's dog) · Cosmo the Spacedog (Marvel) · Mark Watney (The Martian) · Joseph Cooper / Amelia Brand / Hugh Mann / Romilly / Doyle (Interstellar) · Doomguy (DOOM) · Captain Olimar (Pikmin) · Major Tom (David Bowie) · Jet Morgan (Journey into Space, BBC 1953) · Will Robinson and family (Lost in Space) · Tracy brothers (Thunderbirds) · Captain Scarlet · Flash Gordon · Dan Dare · Jupiter Jim (TMNT) · Vincent Freeman (Gattaca) · Frank Poole / David Bowman (2001: A Space Odyssey) · Cuthbert Calculus / Captain Haddock (Tintin) · Ijon Tichy and Pilot Pirx (Stanisław Lem) · Helena Russell / John Koenig / Maya (Space: 1999) · Adelaide Brooke (Doctor Who) · Lara Lor-Van and Hank Henshaw (Superman)
Wikidata astronaut contamination summary (2026-04-13)
Total Wikidata entities with P106=Q11631 (astronaut occupation): 1,044 · Fictional / character / droid / toy subset: 162 · Contamination rate: 15.5% · Confirmed examples include 1 dog (Snowy), 1 droid (R2-D2), 1 toy (Buzz Lightyear), 1 song character (Major Tom), and 158+ other comic / film / video game / TV characters
Wikidata vs intended use
Intended downstream consumers: LLM training pipelines, Smithsonian Air and Space Museum exhibit databases, science journalists, chatbots, knowledge graph projects · None of them expect Wonder Woman or Snowy in the result · The Wikidata schema does not enforce that P106 occupations must be held by real-world persons · Filter to fix: add P31 = person (Q5) and FILTER NOT EXISTS for character / fictional types

Next steps

  • File a Wikidata bot proposal to add a constraint on P106=Q11631 (astronaut occupation) requiring P31 to be a person-class entity (not a fictional character or droid).
  • Run the same fictional-contamination check across other professional occupation tags (lawyer, pilot, doctor, scientist, soldier) to identify whether the 15.5% astronaut rate is unusually high or typical.
  • Push the finding to LLM training pipelines (Common Crawl, RedPajama, the LLM Foundry datasets) so that 'astronaut' Wikidata queries can be cleanly filtered before being used as ground truth.
  • Submit the Snowy / R2-D2 / Buzz Lightyear examples to the Wikidata Project Astronaut WikiProject for a community discussion about whether to retain the in-fiction occupation classification or reclassify.

Artifacts

Sources