Skip to Main Content

Data Resources & Tools

What is happening to federal datasets?

Beginning in January 2025, many federal datasets, websites, and other previously accessible resources are being taken offline to comply with executive orders, most notably CDC, EPA, NIH and NCES data.  Much of the data targeted is related to different demographics, especially race/ethnicity, gender, and sexuality. Because these variables are important factors in research across many fields, including health, criminal justice and the environment, many large and broad-scope data sets are affected. In some cases, press releases or data documentation have been removed; in others entire datasets have been taken down. Evidence is growing that even datasets that remain accessible on an agency’s website may have scrubbed, corrupted, or otherwise altered information.


Learn more about missing, altered or restored federal data:

New York Times: Judge Orders C.D.C. to Temporarily Restore Deleted HHS, CDC & FDA Web Pages. The temporary restraining order was granted in response to a lawsuit filed against the federal government by Doctors for America (DFA), a progressive advocacy group representing physicians, and the nonprofit Public Citizen, a consumer advocacy group. 

Previously restored pages include the Atlas Tool, used by policymakers to track rates of infectious diseases such as HIV and STIs; pages that explained the Youth Risk Behavior Surveillance System, which monitors adolescent health; and the CDC's data site.

Silencing Science Tracker: joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund, tracking government attempts to restrict or prohibit scientific research since the November 2016 election

New York Times: initial summary of government web pages removed as of Feb. 3, 2025. *March 7 update: List of words targeted for removal from government websites.

The Journalists Resource: overview of the current situation regarding federal health databases, including tips on preserving data; from the Shorenstein Center at the Kennedy School 

Environmental Data & Governance Initiative: advocacy group for access to environmental data 

Data Rescue Efforts

Data Rescue Project: an evolving list of crowd-sourced efforts to preserve and maintain accessibility to data

End of Term Crawl: Internet Archive cache of government web sites prior to presidential inaugurations 

Harvard Library Innovation Lab: an effort from the Harvard Law School Library to provide access to major datasets from data.gov, PubMed, and federal GitHub repositories 

CDC Datasets on Internet Archive: CDC datasets uploaded before January 28th, 2025

Public Environmental Data Partners: federal environmental datasets, including GitHub access to the Social Vulnerability Index and Environmental Justice Indexf

Finding Rescued Data

Below is a concise guide to help you locate US federal government data that may have been removed or redacted following the Presidential Executive Orders that went into effect on January 31, 2025. Please note that this guide only covers how to find removed information. For current or active government data, Data.gov remains the best resource for discovering existing federal data. 

Retrieving Rescued Federal Data and Websites
  1. Confirm the Data Has Actually Been Removed

    Before you begin searching for rescued data, it's a good idea to double-check that the information is truly gone from official sources:

    • Search Data.gov to ensure it's not still listed there.
    • Visit the agency's current website to see if the dataset or page has simply been relocated.

    If you have confirmed that the data or information is missing, move on to archival resources.

  2. Use the Internet Archive Wayback Machine

    The Internet Archive Wayback Machine is the largest web archive, capturing snapshots of websites across the internet over time. It allows you to view websites as they appeared on specific dates in the past.

    By entering a URL in the Wayback Machine site, you can see archived versions of that site from different dates, effectively allowing you to go back in time and recover content that might have been removed or changed.

    How to Search with Gov Wayback

    Gov Wayback is a specialized tool that helps locate federal websites in the Internet Archive Wayback Machine. By appending wayback.org to the URL of a .gov website, you will be automatically directed to that webpage's record within the Internet Archive. Be aware that while this tool works with many .gov domainsit is not comprehensive.

  3. Check the with Data Rescue Project's Data Rescue Tracker

    If you are looking for a dataset and cannot find the data you need in the Wayback Machine, the Data Rescue Project may have archived it. They maintain the Data Rescue Tracker, which lists rescued datasets along with links to where they have been archived. The Data Rescue Tracker is continually being updated, but it is not comprehensive.

    If your dataset or information is not listed, proceed to check other archives.

  4. Explore Other Archives

    If the Data Rescue Tracker does not lead you to what you need, there are additional archives that may have captured government websites or data. This guide includes links to some of these archives on the Archives of Government Data and Archives of Government Websites pages linked to the left.

    The Boston University School of Public Health's Center for Health Data Science provides a Find Lost Data search tool that queries a collection of alternative databases at once.

Locating Potentially Redacted Websites

If you suspect that a government webpage has been edited or partially redacted (rather than fully removed):

  • Compare archived versions of the same page using the Internet Archive Wayback Machine. Look for differences between older snapshots and more recent ones.
  • Check the End of Term Archive. The End of Term (EOT) Archive is a collaborative project that systematically saves U.S. government websites during the transition between administrations. Since it focuses on capturing a broad swath of federal web content at key points, it might have a version that predates any redactions.
Confirming Undocumented Data Redactions

If you suspect that the data you have access to may have been changed or partially removed without any official notice, the general steps outlined above still apply but with a slightly different focus:

  1. Compare Snapshots of the Same Dataset or Webpage
    • Use the Internet Archive Wayback Machine to retrieve older versions of the webpage or dataset you're examining.
    • Compare the current version and the archived versions side by side to see if any variables, fields, or sections appear altered or missing.
    • Tools like Diffchecker and others can help you systematically compare files to locate specific changes.
  2. Check for Metadata or File Size Discrepancies
    • Look at file sizes, timestamps, and metadata (like the date of last modification) to see if anything changed unexpectedly.
    • Reductions in file size or missing metadata can be indicators that parts of the data might have been removed.
  3. Look for References to the Data in Other Sources

    If the data in question was cited by academic articles, reports, or news stories, see if the version they reference differs from what is now publicly available. This can help you confirm that a redaction or change has occurred.

Conclusion

While Presidential Executive Orders may have led to the removal or redaction of certain data, there are numerous archived sources that can help you recover or compare older versions of government websites and datasets. Always begin by confirming that the information has truly been removed or altered. If it has, work through the tools below in this general order:

  • Data.gov - Confirm the data is not active or relocated.
  • Internet Archive Wayback Machine - Locate past versions of web pages.
  • Data Rescue Project's Data Rescue Tracker - Check for rescued datasets.
  • Other Archives - Consult relevant domain-specific or institutional repositories.

Oxy Resources for Federal Data

These resources, available through the Library, have committed to maintaining access to data now scrubbed from federal agencies.