How to read this archive
Before drawing conclusions, it helps to know what this dataset is — and what it can and cannot support. The single most important point: the reliable signal here is in the record and the disclosure process, not the phenomenon. The footage is too low-resolution and the sample too biased to settle what the objects are; but the documents tell us a great deal about how the U.S. government collected, withheld, and released this material.
Six things to keep in mind
the dataset's characterWhat happened to the cases
resolution funnelThe modern (2000+) set is almost entirely unresolved — the better-instrumented era leaves more cases open, not fewer. Note the selection effect: the modern records are a curated military UAP collection, biased toward exactly the cases that were not explained away.
212 records is not 212 events
same-event compositionThe corpus includes compilations and cross-referencing documents, so some records describe the same encounter or closely-related ones. By text similarity, 50 records fall into 22 clusters of two or more — read those as related, not independent, so a pattern isn’t double-counted.
A known gap — the big files are under-read
document coverage is a floor, not a censusThe largest mid-century compilations — the FBI HQ 62-83894 case-file sections, roughly 2,500 pages — mix typed memos, newspaper clippings, cursive correspondence, and faint scans. Automated OCR garbles clippings, handwriting and poor scans, so these files are under-extracted: the 212 structured records undercount the reports and material the documents actually contain, especially the handwritten mid-century pages. (Use full-text search to see the raw text directly.) A vision-based transcription pass is planned to recover them. Until then, treat document coverage as a floor, not a complete census.
In short: trust the archive about itself — its timing, its geography of collection, what it withheld, what it left unresolved. Treat everything about the objects as provisional and resolution-limited. The deep views (Objects, Patterns, Behavior, Map) carry their own confounds inline.