Agency Dataset Publication in Data.gov

Jun 20, 2014

Not sure how to get your datasets into Data.gov? We’ve put together an overview to show you how the process works.

Agencies prepare their enterprise data inventories in data.json format and post them on their websites (agency.gov/data.json), pursuant to the Open Data Policy and following the guidance and using the tools available on Project Open Data. Data.gov also offers a tool called inventory.data.gov that can be used to assist agencies in creating their data inventories.

Harvesting Datasets

The Data.gov team at GSA works with agencies to validate their data.json and identifies any errors that the agency needs to fix. Once the agency data.json is revised and revalidated, the agency posts the corrected file and directs the Data.gov team to begin “harvesting” their data.json into the Data.gov catalog.

The Data.gov team sets up the harvest of the agency data.json and sets the frequency (normally on a daily basis). Metadata records (datasets) that are invalid against the Project Open Data schema are rejected, but all valid datasets will be published in the Data.gov catalog. The valid datasets that are published replace the existing dataset records for that agency. The Data.gov team and the agency continue to collaborate to fix errors.

The harvesting process continues on a daily basis and the agency’s catalog listings on Data.gov are continually updated.

Data.gov Topics

Data.gov currently has 21 Topics on issues such as Agriculture, Climate, Education and Public Safety. The Topics have their own curated collections of datasets. Currently, the addition of datasets to Topics collections is not automated. The community leaders of the Topics areas indicate datasets to be tagged for the Topic and the Data.gov team updates the datasets with the relevant tags to include them in the desired Topic. There is work ongoing to streamline this process.

Accessing Datasets

For end users, once agency datasets are published in the Data.gov catalog, they can access the catalog through the user interface (based on the open source platform for data catalogs called CKAN). The Data.gov catalog is also available through the Data.gov CKAN API.

Hyon Kim is the Deputy Program Director for Data.gov at the U.S. General Services Administration (GSA).