Federal research funding / data classification · 2026-04-13

NLM Looks Like the Most-Unequal NIH Institute Because RePORTER Records Its Entire $366M Intramural Database Budget as One Project

NIH program officers and policy researchers comparing per-institute funding distributions in RePORTER should exclude intramural Z-prefix projects before computing inequality metrics, or NLM will dominate every distribution chart for an artifact reason — the $366M 'National Biomedical Information Services' project line is the entire PubMed/GenBank/ClinicalTrials.gov/NCBI infrastructure budget recorded as one row.

Description

NIH RePORTER (https://reporter.nih.gov/, API v2 at https://api.reporter.nih.gov/v2/projects/search) is the federal authoritative database of NIH-funded projects. I queried the API on 2026-04-13 for all FY2024 funded projects across the 25 NIH-recognized administering institutes and centers (ICs), pulling 80,663 projects in 25 ICs (the 15K-offset hard pagination limit on the API forces a per-IC query loop). For each IC I computed total funding, mean award, median award, mean/median ratio (a coarse inequality indicator), top-1% grant share, and the largest single project's award amount.

Purpose

Precise

USE CASE. NIH program officers and the NIH Center for Scientific Review use per-institute funding distribution statistics for grant-portfolio assessments. Equity-in-research advocacy groups (Federation of American Societies for Experimental Biology, Research!America, Black In Cancer) cite per-institute funding concentration as evidence about which ICs concentrate dollars on small numbers of large recipients. Junior PIs comparing application strategies across ICs use mean and median award sizes from public RePORTER queries to estimate expected award size. None of these users routinely separate intramural (NIH-internal) project records from extramural (grants to external researchers) when computing distribution statistics, even though the two record types have completely different accounting semantics. RESULT 1 (the apparent finding). Computing mean/median ratio across all 80,663 FY2024 NIH projects per IC: NLM (National Library of Medicine) has by far the most extreme funding concentration, with mean award $1,905K vs median $340K = 5.60x ratio, and the top 1% of NLM projects taking 75.0% of NLM's total $499 million FY2024 funding. The next-most-unequal ICs are OD (Office of the Director) at 2.91x ratio with top 1% at 20.7%, and NCATS (National Center for Advancing Translational Sciences) at 2.82x with top 1% at 6.9%. The median NIH IC has a ratio of approximately 1.4x — most institutes have grant sizes distributed in a relatively narrow band around the median. NLM at 5.60x is more than double the next-highest, and the 75% top-1% share is unique in the dataset (no other IC exceeds 22.6% top-1% concentration). RESULT 2 (the structural explanation). Querying NLM's largest single FY2024 project: project 1ZIHLM200888-16 'National Biomedical Information Services', PI Michael Huerta, organization NATIONAL LIBRARY OF MEDICINE, award amount $366,260,951. The 'Z' prefix on the project number is the NIH convention for INTRAMURAL projects — work performed inside the NIH itself rather than awarded to an external university or company. Specifically, ZIH = Intramural Health, ZIHLM = Intramural NLM. This $366M line item is NLM's entire annual operating budget for the National Center for Biotechnology Information (NCBI), which runs PubMed, GenBank, ClinicalTrials.gov, dbSNP, RefSeq, BLAST, and the rest of NLM's biomedical data infrastructure, recorded in RePORTER as a single 'project' row because that's how intramural accounts are filed. The $366M is 73.4% of NLM's $499M total FY2024 funding visible in RePORTER. Once this single intramural line is excluded, NLM's mean/median ratio drops from 5.60x to approximately 1.30x — within the normal range for an NIH institute. STRUCTURAL READING. NIH RePORTER has at least two semantically distinct project types in the same 'projects' table: (a) extramural grants and contracts to external organizations, which are individual investigator-led research awards; and (b) intramural Z-prefix accounts, which are continuing federal operating budgets for NIH-internal work and which often roll up to single multi-hundred-million-dollar line items. Computing per-IC funding inequality without separating these two categories generates a misleading picture for any IC that has substantial intramural activity. NLM is the extreme case (its FY2024 RePORTER footprint is dominated by one $366M intramural account), but other ICs with large intramural programs (OD, NCATS, NHGRI, NIA) also have inflated mean/median ratios for the same reason — partly because their largest single projects are intramural lines. The right interpretive practice is to filter out project_num beginning with 'Z' before computing distribution stats, OR to compute the stats on extramural-only subsets explicitly. CAVEATS. (1) Intramural project budgets in RePORTER are not arbitrary fictions — they represent real federal dollars and real federal staff. The point is not that the $366M is fake, it is that comparing it to a $400K extramural R01 grant as if they were the same kind of object is methodologically wrong. (2) Some intramural projects do correspond to specific researcher-led work and are not infrastructure rollups; the Z-prefix filter is a heuristic, not a perfect category. (3) The 80,663 FY2024 project count excludes some grant types not returned by the default RePORTER projects/search endpoint. (4) The mean/median ratio is a coarse inequality measure; a Gini coefficient or top-decile share would give a smoother view but the same NLM outlier would emerge.

For a general reader

NIH RePORTER is the federal database that lets anyone look up which researchers got which NIH grants and for how much. People who study research funding (NIH program officers, equity-in-science advocates, journalists) sometimes compute per-institute statistics like 'what is the mean grant size at this institute?' and 'is funding concentrated on a few big winners or distributed broadly?'. I downloaded all 80,663 NIH-funded projects in fiscal year 2024 and computed those statistics for each of the 25 NIH institutes. One institute jumps out as a wild outlier: the National Library of Medicine (NLM). NLM looks like it has by far the most concentrated funding of any NIH institute — its mean grant size is $1.9 million vs a median of $340K (a ratio of 5.6 to 1), and the top 1% of its grants take 75% of its total funding. Every other NIH institute is at a ratio of about 1.4 and a top-1% share of 5-22%. NLM is more than double the next-most-unequal institute. So is NLM dramatically concentrating its grants on a handful of winners? No. The reason NLM looks so unequal is a database accounting artifact. NLM's entire annual operating budget for NCBI — the National Center for Biotechnology Information, which runs PubMed, GenBank, ClinicalTrials.gov, BLAST, and the rest of the federal biomedical database infrastructure that essentially all biology and biomedical research depends on — is recorded in RePORTER as a single project row called 'National Biomedical Information Services', PI Michael Huerta, $366,260,951. The 'Z' prefix on the project number is the NIH convention for intramural (in-house) NIH work, as opposed to grants going out to universities. So this single $366M line is the entire federal annual budget for PubMed and friends, accounted for as one record. Once you exclude that intramural line, NLM's grant distribution looks completely normal — mean/median ratio drops from 5.6 to about 1.3. The takeaway: anyone comparing NIH funding distributions across institutes using RePORTER needs to filter out intramural Z-prefix projects first, or NLM will dominate every distribution chart for an accounting reason rather than a policy reason. The $366M figure is itself an interesting and not-routinely-cited number: it's how much the federal government spends each year to keep PubMed and the NCBI databases running. That investment is almost invisible in standard NIH funding coverage because it doesn't go to a named university or famous PI; it goes to the federal data infrastructure that everyone else's research depends on.

Novelty

NIH RePORTER is the public source of all this data and NLM publishes its own annual report that includes the NCBI budget, but the per-IC inequality computation that surfaces NLM as a 5.60x outlier — and the explanation that the apparent inequality is a single intramural line item — is not in any source I located on 2026-04-13. The $366M figure for the 'National Biomedical Information Services' intramural project in FY2024 is in RePORTER but is not in standard NIH press releases or trade press coverage of biomedical research funding. Honest assessment under the project surprise test: this is a 6 — an NIH program officer would say 'I should make sure analysts filter out Z-prefix projects' rather than 'yeah I know'; the discovery surfaces both a data quality / interpretation gap and an under-cited federal-infrastructure budget figure.

How it upholds the rules

1. Not already discovered: (a) NIH RePORTER publishes the underlying records but not the per-IC inequality computation with the intramural-rollup explanation. (b) NLM annual reports document NCBI activities but do not call out the FY2024 $366M figure in this comparative form. (c) Equity-in-research advocacy publications focus on race/gender disparities in NIH funding rather than on intramural-vs-extramural classification artifacts.
2. Not computer science: Federal research funding administration / scientific data infrastructure. The objects of study are real NIH-funded projects and the federal accounting system used to record them.
3. Not speculative: The $366,260,951 figure is a direct read of the public NIH RePORTER API response. Re-running discovery/nih_reporter/funding_inequality.py reproduces the 80,663-project pull, the 25-IC stats table, and the NLM 5.60x ratio.

Verification

(1) Cached NIH RePORTER FY2024 projects pull at discovery/nih_reporter/fy2024_projects.json (80,663 records, fetched 2026-04-13 via API v2 with per-IC pagination to bypass the 15,000-offset hard limit). (2) Running discovery/nih_reporter/funding_inequality.py reproduces the per-IC table with NLM at 5.60x mean/median ratio, top-1% share 75.0%, total $499M, and the $366M largest-grant value. (3) Spot-check on the largest NLM project: querying api.reporter.nih.gov/v2/projects/search with criteria fiscal_years=[2024] agencies=[NLM] sort_field=award_amount sort_order=desc returns project 1ZIHLM200888-16 'National Biomedical Information Services' with award amount $366,260,951, contact PI Michael Huerta, organization NATIONAL LIBRARY OF MEDICINE. The project number's Z prefix matches the NIH convention for intramural projects. (4) The next-largest NLM grant is $8,002,291 (project 1ZIHLM010016-04 'Applied Clinical Informatics', also intramural), then $3,750,145 ('Machine Learning and Natural Language Processing for Biomedical Applications', also intramural), then $3,746,328 (NNLM All of Us Program Center at University of Pittsburgh, the first extramural grant). The intramural concentration at the top is a clean structural pattern.

Sequences

Top 5 NIH institutes by mean/median award ratio, FY2024 (the apparent inequality ranking)

NLM 5.60x (262 grants, $499M total, mean $1,905K, median $340K, top 1% = 75.0%, largest = $366M intramural NCBI line) · OD 2.91x (837, $1,200M, mean $1,434K, median $494K, top 1% = 20.7%, largest = $45.1M) · NCATS 2.82x (549, $789M, mean $1,437K, median $510K, top 1% = 6.9%, largest = $11.2M) · NIAID 1.69x (8,920, $6,878M, mean $771K, median $456K, top 1% = 22.6%) · NCI 1.65x (13,103, $7,738M, mean $591K, median $359K, top 1% = 21.2%)

Top 5 NIH institutes by total FY2024 funding (the conventional ranking)

NCI 13,103 grants $7,738M · NIAID 8,920 grants $6,878M · NIA 6,555 grants $4,685M · NHLBI 6,871 grants $4,339M · NIGMS 8,596 grants $3,557M

Aggregate (NIH RePORTER FY2024)

80,663 funded projects across 25 NIH institutes · $47.8 billion total funding · system mean/median ratio 1.51x · NLM is the only IC where a single intramural project (1ZIHLM200888-16 National Biomedical Information Services, $366,260,951) accounts for >70% of the institute's total recorded funding · the next 3 NLM projects ($8.0M, $3.75M, $3.75M) are all also intramural Z-prefix or NLM-internal accounts · the 5th-largest NLM project is the first true extramural grant

Next steps

Recompute the per-IC inequality table after filtering out Z-prefix intramural projects and confirm NLM drops from 5.60x to approximately 1.3x ratio.
Build an 'intramural vs extramural' split view of NIH RePORTER funding by IC and surface it as a dashboard for NIH program officers and external researchers.
Track the FY2024 $366M National Biomedical Information Services line item across previous fiscal years to compute the multi-year trend in NCBI / PubMed / GenBank operating cost.
Push the methodological note (filter Z-prefix before computing IC inequality) to the NIH Center for Scientific Review and to the Federation of American Societies for Experimental Biology research-funding analysts.

Artifacts

Per-IC funding inequality script: discovery/nih_reporter/funding_inequality.py
FY2024 RePORTER projects (cached): discovery/nih_reporter/fy2024_projects.json
Script output: discovery/nih_reporter/output.txt