How LMGTDFY works
Let Me Get That Data For You searches any website for data in machine-readable formats and provides a list. Here is U.S. Open Data’s background reasoning for creating this tool:
When government agencies create an open data repository, they need to start by inventorying the data that the agency is already publishing on their website. This is a laborious process. It means searching their own site with a query like this:
site:example.gov filetype:csv OR filetype:xls OR filetype:json
Then they have to read through all of the results, download all of the files, and create a spreadsheet that they can load into their repository. It’s a lot of work, and as a result it too often goes undone, resulting in a data repository that doesn’t actually contain all of that agency’s data.
Realizing that this was a common problem, we hired Silicon Valley Software Group to create a tool to automate the inventorying process. We worked with Dan Schultz and Ted Han, who created a system built on Django and Celery, using Microsoft’s great Bing Search API as its data source. The result is a free, installable tool, which produces a CSV file that lists all CSV, XML, JSON, XLS, XLSX, XML, and Shapefiles found on a given domain name.
The results were formerly limited to 300 files due to the cost of querying the Bing API, but due to a donation by Microsoft on March 23, 2015, the tool can currently return 2,000 files. Moreover, should an agency want to expand the query parameters: the code behind the site is all open source.
Use LMGTFY to vet your Public Data Listing
The US Federal Open Data Policy requires CFO-Act agencies to catalog all of their data in Enterprise Data Inventories. Tools like Let Me Get That Data For You and Google’s Advanced Search (e.g. search by domain and by file type) and searching across shared drives by file type can help agencies make sure that their required Enterprise Data Inventories are truly comprehensive.