On October 14, 2020, Reclaim The Records made a Freedom of Information Act (FOIA) request to the United States National Archives and Records Administration (NARA), asking for billions (yes, billions) of digital images and their associated text metadata, to return online access to American historical documents to the public.
This is the full text of that FOIA request, which we submitted online through the MuckRock platform:
FOIA Request for all digital images and text metadata created through NARA’s public-private digitization partnership program
To Whom It May Concern:
This is a request under the Freedom of Information Act.
I represent a 501(c)(3) non-profit organization called Reclaim The Records. We are an activist group of genealogists, historians, journalists, teachers, and open government advocates. We acquire genealogical and historical databases and images from government sources, including government archives, often through the use of Freedom of Information laws. We then upload those records to the Internet, without any copyright or usage restrictions or paywalls, making them freely available to the public and returning these taxpayer-funded materials to the public domain.
PART I: BACKGROUND FOR THIS REQUEST
The United States National Archives and Records Administration (NARA) has for several years managed an innovative public-private partnership program to digitize many of the important historical documents they hold, particularly records that would be useful for family history research. These include multiple enumerations of the United States Federal Census (through 1940), immigration and naturalization records, military and veteran records, tax assessment lists, and more.
More than four hundred of these important historical record sets have been digitized so far under this long-running partnership program, with each of those record sets containing hundreds of thousands, or more often millions, of individual documents. A likely-incomplete listing of these record sets is available on the NARA web page “Microfilm Publications and Original Records Digitized by Our Digitization Partners” located at https://www.archives.gov/digitization/digitized-by-partners . The total number of unique historical documents digitized and transcribed through this program is probably in the billions.
In exchange for having private corporations and non-profit organizations agree to become “partners” and digitize these historical records from their original paper or microfilm formats — a massive task that would be largely cost-prohibitive for NARA to conduct on its own — NARA agreed to let these partners have the exclusive use of those newly-digitized materials on their own websites for a certain amount of time, an “embargo period”.
This grant of a supposedly exclusive entitlement to public records was meant to induce these partners to spend their time and money to conduct the records digitization and transcription at their own expense, instead of at the taxpayer’s expense. But while well-intentioned, it also meant that these original historical records were often completely removed from public access while the companies worked on them, making the records functionally unavailable to researchers, sometimes for years.
And even once the digitization and transcription work was finally completed, the exclusivity period for each newly-created digital record set was also supposed to be time-limited. After the stated embargo period would end for each unique record set, usually within five years but sometimes in three years, NARA would then be able to freely disseminate the now-digitized versions of these public documents, both the images and the text metadata that accompanied them. NARA’s own policies state that the agency could and would publish the digital copies through NARA’s own website or in their official online Catalog or through their official API access or through other means. See item number two from “NARA Principles for Partnerships to Digitize Archival Materials” at https://www.archives.gov/digitization/principles.html :
“2. After an agreed-upon period of time, otherwise known as an embargo period, NARA gains unrestricted rights to the digital copies and the associated metadata transmitted to NARA by the partner, including the right to give or sell digital copies in whole or part to other entities, if NARA so chooses. If resources permit, we will try to make the digital materials available in our online catalog within the same year they are no longer in the embargo period.”
But in practice, this simply hasn’t happened. NARA has never actually posted online the vast majority of these records that were digitized through their partnership program, not to their Catalog nor indeed anywhere else where the public might be able to freely access and download the now-digital records. This remains the case today, even when the embargo periods for many of these record sets have been expired for more than a decade, sometimes two decades. A small number of the records are now finally online in the NARA Catalog, but even there, the data sets are still not available to the general public as bulk image or bulk data downloads and are cumbersome to search or use individually.
Instead, literally billions of these historical American records remain solely in the hands of NARA’s primary digitization program partner, Ancestry.com. Ancestry is a private corporation, previously co-owned by a private equity firm and the government of Singapore’s sovereign wealth fund, until they were sold to a different private equity firm for $4.7 billion in August 2020. Ancestry has purchased several smaller companies in the genealogy and family history space over the past few years, including the companies Fold3.com and Archives.com, both of which had previously independently been included in NARA’s digitization partnership program. Thus, the vast majority of the billions of records digitized through NARA’s partnership program are now available only behind Ancestry’s subscription paywall, or through companies now owned by Ancestry with their own additional subscription paywalls. Annual subscriptions to these websites can cost hundreds of dollars per year per person.
NARA surely did not mean to create a de facto monopoly on nearly all digital copies of important American historical documents like the Census and immigration records and military files, all for the benefit of a single private corporation. But by not making the no-longer-embargoed documents available to the public anywhere else, not even on NARA’s own website, and leaving them solely in the hands of their mostly-commercial partner organizations, that is exactly what has happened.
NARA’s own “Principles for Partnerships to Digitize Archival Materials”, as referenced above, clearly states in item number seven that:
“Public access to publicly owned resources will remain free. Partners may develop and charge for value-added features, but access to the digital copies ultimately should be readily accessible and free…NARA will have unrestricted ownership of these copies, including the right to make these copies freely available online for download.”
However, in practice, NARA has also repeatedly denied independent requests for copies of even subsets of this voluminous partnership-created digital data. We are aware of at least three different entities, two genealogy-related corporations and one non-profit organization, none of which were NARA digitization partners, who each independently requested and were each denied access to copies of this data through e-mails, phone calls, meetings, and other discussions with NARA leadership. In all three cases, NARA denied the requests, saying that NARA would put the records online themselves, through their Catalog or API…eventually.
Thus, the end result of NARA’s digitization partnership program has been that billions of important American historical documents were successfully digitized and transcribed — but then were mostly not made available to the public for decades in any way other than by requiring the public to buy expensive annual data subscriptions benefiting private corporations, primarily a single multi-billion-dollar conglomerate, whose previous owners included a foreign government.
We at Reclaim The Records would now like to make an official request for open public access to these important American historical records.
PART II: OUR REQUEST
Under the Freedom of Information Act, we at Reclaim The Records request copies of the following:
1) We request every single record created under NARA’s public-private digitization partnership with the entities Ancestry.com, Fold3.com (formerly known as Footnote, now owned by Ancestry), Archives.com (now owned by Ancestry), and FamilySearch (a non-profit organization). We do not request any records that were created through NARA’s partnership with other smaller entities, such as the Daughters of the American Revolution (the DAR). Specifically:
1a) We request all of the digital images, in their original, full-size, uncompressed, and non-watermarked versions.
1b) We request all of the associated text metadata (names, dates, places, etc.) also created under the partnership agreement, which goes along with those images, making them searchable. For example, a spreadsheet or database may have been created for each data set that lists the name of each person referenced in each image, along with the date, the location, or other extracted information such as place of birth, marital status, volume number, census enumeration district, microfilm reel number, or any other text information relevant to that particular data set and/or each individual image.
1c) We request all copies of finding aids, training materials, handbooks, checklists, formatting guidelines, data dictionaries, data templates, data lists, or other internal documentation that explains more about the digitization of these images and the transcription and compilation of their associated text metadata, and how they relate to each individual data set.
2) We also request any records that were digitized under NARA’s partnership program that may not have been properly delivered or returned to NARA after their digitization was completed. We have heard stories about records that remain solely in the possession of certain partner corporations, for which NARA never collected the files upon completion of the image scanning and the text metadata entry. We therefore request copies of all the partnership-created digital images, associated text metadata, and finding aids (or data dictionaries, documentation, templates, etc.) for those previously-undelivered files, as well. To be clear, we contend that NARA is required to collect these records from these companies and produce them to us in response to our request and we are requesting that NARA do so.
PART III: FORMAT OF PRODUCTION
We request that all of these files, the images and text metadata and finding aids and data dictionaries and so on, be turned over to us in their original digital formats, as they were delivered to NARA by the partners, or turned over for the first time if the partner never delivered the final files to NARA as they should have.
We would like to receive our copies of this information on portable USB drives. We are willing to pay the costs for purchasing those drives and for their insured and trackable domestic shipping. However, we believe some of this data may already be stored online in the Amazon Web Services (AWS) S3 Glacier system, which we believe NARA uses for its internal file storage. If this is the case, then for any data sets that are already completely online in AWS S3, we would consider receiving just the online versions of those specific data sets, by having that data copied directly from NARA’s AWS S3 bucket(s) into Reclaim The Records’ AWS S3 bucket(s), and those data sets would then not need to be downloaded to a USB drive.
Please inform us of all fees in advance of fulfilling our order.
PART IV: REQUEST FOR FEE WAIVER
We also request to be treated as a “media requester” for the purposes of calculating the fees for this FOIA request. We are a non-profit organization, not a commercial entity. We do not charge for copies of any of the tens of millions of records we have already acquired from government agencies and released to the public. We are one of the largest open records organizations in the United States. As of October 1, 2020, our e-mail newsletter, which has been published several times a year for the past six years, now has a circulation of over 7,500 subscribers. Our social media outlets such as our Facebook page have more than 11,000 followers, and our Twitter account has more than 6,100 followers.
We have even created several free standalone websites to both disseminate and discuss the data that we receive from government entities. As just one example, please see our website https://www.MissouriDeathIndex.com/ and our multiple associated newsletter issues linked from that website. We don’t just release data sets, we discuss them too, using our editorial skills and discretion, and then disseminate those discussions to our readers.
Therefore, under 45 CFR 1602.2, we believe that we properly meet the legal qualifications as a “media requester” entity, and so we would need to pay only any duplication fees after the first 100 pages of material, and we should not need to pay any search fees or review fees.
Thank you for your consideration, and we look forward to your timely response within twenty business days, as the statute requires.
We will provide updates when we learn how NARA chooses to respond to our request.
Documents related to this request are coming soon.
State or Vital Records Jurisdiction: Nationwide
Archive or Library: United States National Archives and Records Administration (NARA)
Government Agency: United States National Archives and Records Administration (NARA)
Record Years: Late seventeenth century to the present
Record Format: Images and Text Metadata
Record Physical Format: Digital images and digital text metadata
Number of Records (Estimated): Unknown Billions, with a B. (Yes, really.)