Skip to Main Content

Data Resources & Tools

What is happening to federal datasets?

Beginning in January 2025, many federal datasets, websites, and other previously accessible resources, across agencies, are being taken offline to comply with executive orders. In some cases, press releases or data documentation have been removed; in others, entire datasets have been taken down. Evidence is growing that even datasets that remain accessible on an agency’s website may have scrubbed, corrupted, or otherwise altered information.


Learn more about missing, altered or restored federal data:

New York Times (02/11/25): Judge Orders C.D.C. to Temporarily Restore Deleted HHS, CDC & FDA Web Pages. The temporary restraining order was granted in response to a lawsuit filed against the federal government by Doctors for America (DFA), a progressive advocacy group representing physicians, and the nonprofit Public Citizen, a consumer advocacy group. 

Previously restored pages include the Atlas Tool, used by policymakers to track rates of infectious diseases such as HIV and STIs; pages that explained the Youth Risk Behavior Surveillance System, which monitors adolescent health; and the CDC's data site.

Silencing Science Tracker: joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund, tracking government attempts to restrict or prohibit scientific research since the November 2016 election

New York Times: initial summary of government web pages removed as of Feb. 3, 2025. *March 7 update: List of words targeted for removal from government websites.

The Journalists Resource: overview of the current situation regarding federal health databases, including tips on preserving data; from the Shorenstein Center at the Kennedy School 

Data Rescue Efforts

Data Rescue Project: an evolving list of crowd-sourced efforts to preserve and maintain accessibility to data. As of April 2025, the Data Rescue Project includes ERICA, a lightweight rescue catalog, which lists all 5000,000 Open Access, full-text PDFs directly hosted on ERIC, the research repository of the Education Department, along with their basic metadata, such as title, author and publication year.

End of Term Archive: captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, 2020, and 2024.  

US Government Web Archive:  from Webrecorder, archived websites from the US Government archived around the end of Biden’s and start of Trump’s presidential administrations, as part of the End of Term Web Archive, available as standalone mirrors hosted on subdomains, aiming to replicate the original site and its URL structure as closely as possible.

Harvard Library Innovation Lab: an effort from the Harvard Law School Library to provide access to major datasets from data.gov, PubMed, and federal GitHub repositories 

CDC Datasets on Internet Archive: CDC datasets uploaded before January 28th, 2025

Environmental Data & Governance Initiative: advocacy group for access to environmental data 

Public Environmental Data Partners: federal environmental datasets, including GitHub access to the Social Vulnerability Index and Environmental Justice Index

Policy Commons 2025 Open Collection: an initiative to rescue and preserve materials from government organizations facing the removal of public information and data—reports, blog posts, videos, and podcasts.

Sustainability Action Network: a database that centralizes climate research and data created by researchers at McGill University, Canada.

 

Finding Rescued Data

This guide will help you locate US federal government data that may have been removed or redacted following the Presidential Executive Orders that went into effect on January 31, 2025. Please note that this guide only covers how to find removed information. Use Data.gov to find existing federal data. 

Retrieving Rescued Federal Data and Websites
  1. Confirm the Data Has Actually Been Removed
    • Search Data.gov to ensure it's not still listed there.
    • Visit the agency's current website to see if the dataset or page has simply been relocated.

    If you have confirmed that the data or information is missing, move on to archival resources.

  2. Use the Internet Archive Wayback Machine

    The Internet Archive Wayback Machine is the largest web archive, capturing snapshots of websites across the internet over time. It allows you to view websites as they appeared on specific dates in the past.

    By entering a URL in the Wayback Machine site, you can see archived versions of that site from different dates, effectively allowing you to go back in time and recover content that might have been removed or changed.

    How to Search with Gov Wayback:

    Gov Wayback is a specialized tool that helps locate federal websites in the Internet Archive Wayback Machine. By appending wayback.org to the URL of a .gov website, you will be automatically directed to that webpage's record within the Internet Archive. While this tool works with many .gov domains, it is not comprehensive.

  3. Check the with Data Rescue Project's Data Rescue Tracker

    If you cannot find the data you need in the Wayback Machine, the Data Rescue Project may have archived it. They maintain the Data Rescue Tracker, which lists rescued datasets along with links to where they have been archived. The Data Rescue Tracker is continually being updated, but it is not comprehensive.

  4. Explore Other Archives

    The Boston University School of Public Health's Center for Health Data Science provides a Find Lost Data search tool that queries a collection of alternative databases at once.

Locating Potentially Redacted Websites

If you suspect that a government webpage has been edited or partially redacted (rather than fully removed):

Confirming Undocumented Data Redactions

If you suspect that the data you have access to may have been changed or partially removed without any official notice, the general steps outlined above still apply but with a slightly different focus:

  1. Compare Snapshots of the Same Dataset or Webpage
    • Use the Internet Archive Wayback Machine to retrieve older versions of the webpage or dataset you're examining.
    • Compare the current version and the archived versions side by side to see if any variables, fields, or sections appear altered or missing.
    • Tools like Diffchecker and others can help you systematically compare files to locate specific changes.
  2. Check for Metadata or File Size Discrepancies
    • Look at file sizes, timestamps, and metadata (like the date of last modification) to see if anything changed unexpectedly.
    • Reductions in file size or missing metadata can be indicators that parts of the data might have been removed.
  3. Look for References to the Data in Other Sources

    If the data in question was cited by academic articles, reports, or news stories, see if the version they reference differs from what is now publicly available. This can help you confirm that a redaction or change has occurred.

Oxy Resources for Federal Data

These resources, available through the Library, have committed to maintaining access to data now scrubbed from federal agencies.