What this archive is — and what it deliberately is not.
This site is a structured analysis layer over the declassified U.S. government UAP record. It surfaces patterns and cites sources; it does not adjudicate what the objects are. The steps below describe how a source document becomes a record.
1 · Acquisition
122 PDFs (4,277 pages), 77 videos (226.3 min), 8 audio recordings and a set of photographs from the declassified U.S. government UAP releases (tranches 1–2), spanning 1944–2025 across 7 agencies. New release tranches are appended and the analyses re-run.
2 · Text extraction & OCR
A digital text layer was extracted from every PDF. Image-only documents — mostly mid-century FBI case files — were rasterised and run through OCR. OCR output is noisy by nature, which is why those records carry lower confidence.
3 · Redaction measurement
Redactions are quantified two ways: explicit FOIA exemption markers (e.g. (b)(1), (b)(7)(C)) are counted in the text, and documents that OCR to near-empty are flagged as visually redacted (blacked out). Both feed the redaction views.
4 · Structured extraction
Each document was read by a language model under a fixed schema. It records date, location, object shape/colour, movement, sensor, resolution and a factual summary — and, critically, keeps the document's verbatim claim ("as reported") separate from the normalized analysis.
5 · Geocoding
Named locations are geocoded; redacted or theatre-level locations (e.g. "Arabian Gulf") are placed at a regional centroid and flagged approximate. We never fabricate coordinate precision.
6 · Similarity & clustering
Incident text is vectorised (TF-IDF) and compared by cosine similarity to surface related records and behavioural clusters. This is an exploratory aid, not a claim of causation.
7 · Footage analysis (infrared video)
Modern IR clips are processed separately. Frames are extracted; static HUD/gate/scale-bar/redaction furniture is removed via a temporal-median plate; and the object is located off-crosshair (it drifts away from the reticle as the operator chases it). For the clearest clips an analyst hand-tags the object across the clip, after which it is isolated, combined by super-resolution or lucky-imaging, and stabilised — recovering shape and motion where the resolution allows, never beyond it. See the Objects and Technical pages.
Limitations we want you to keep in mind
- Extraction is automated. Average confidence across the corpus is 61%; individual records show their own.
- Heavily redacted documents yield thin records — absence of a field means the source withheld or obscured it, not that nothing happened.
- “Anomalous kinematics” reflects what a report claims, taken at face value. It is a flag for scrutiny, never a verdict.
- Mid-century OCR contains transcription errors; treat exact figures from those records cautiously.
The role of AI, and where the human is
This is human-directed work done with AI assistance, and it is more useful to say so plainly than to pretend otherwise. The volume — thousands of pages and hours of footage — is processed with machine help at the mechanical steps: OCR of scanned documents, transcription of audio, vision descriptions of frames, and a first-pass extraction of fields from text. AI is a labor multiplier on reading and structuring, not an oracle on what the objects are.
The judgment stays with a person. The schema, the inclusion rule, every claim made on the analysis pages, and the decision about what a result does and does not support are set and checked by hand. Automated outputs are labeled as automated, kept separate from the verbatim source text, and treated as hypotheses to verify — not findings. Where an automated read was wrong, it is corrected at the source and noted, not quietly dropped.