A recent study of big data initiatives in 65 cities has interesting guidance for Federal big data initiatives. The researchers studied how data is collected and then used for decision making in what they called “the framework for Big Data initiatives.” There are two major cycles in the framework:
“The data cycle governs the tools and processes used to collect, verify, and integrate data from multiple sources. Because of the variety of data sources involved, data teams in this cycle are [sic] often composed of representatives from multiple departments to leverage their field expertise and insider understanding of the data. In addition, new technologies—such as Hadoop, Hadoop-like technologies, stream analytics, massive parallel processing data warehouses, machine learning, and real-time analytics—are used in this cycle to process large and diverse types of complex data.
The decision-making cycle starts after the data are cleaned, integrated, and analyzed. The results are interpreted and transformed by data teams into performance indicators or dashboards. In this cycle, data analytics results are provided to the decision-making units at the program, departmental, and enterprise levels, and evaluation results are used to inform policy goals and priority setting, budgeting, program management and resource allocation, and public reporting.”
In the data cycle, cities are using various channels to collect data: government website traffic, social media, mobile apps, connected sensors, and video cameras. Once the data is collected, it needs to be cleaned and transformed into a format for decision making. The previous sentence reads much easier than the actual process of data cleaning. I can tell you from personal experience that much of a data scientist’s job is extracting, transforming, and loading (ETL) the data for analysis. The ETL process is even more vital and complicated due to the vast amount of diverse data streaming into the databases.
The data analysis tools have also become more sophisticated. As the report explains, descriptive statistics were the preferred data analysis method. Analysts would calculate means and percentiles while presenting the data in simple bar charts and line charts. Now, analysts have tools such as “classification analysis, association and cluster analyses, anomaly detection, neural network analysis, dimensionality reduction, and various types of regression models.” What is also new is that some cities allow citizens to conduct their analysis through releasing the data and providing online analytical tools.
However, just technology and tools are not driving the effective use of big data by the cities. The cities also had to overcome major challenges. There are human resources challenges such as training the staff on data-driven decision making and finding enough data scientists. There are technical constraints like outdated IT systems, data quality, and data locked away in silos. Along with these challenges are the even bigger cultural challenges of having departments work together on data projects, accept the results of the data analysis, and gaining leadership support. The Federal agencies also face these challenges so; it may be useful to see how the cities have handled similar challenges.
This report is rich with great insights and lessons-learned. It would also be useful to see a similar report using the framework for Big Data initiatives to analyze how the Federal agencies are implementing their big data initiatives.
Each week, The Data Briefing showcases the latest federal data news and trends. Visit this blog every week to learn how data is transforming government and improving government services for the American people. If you have ideas for a topic or have questions about government data, please contact me via email. Dr. William Brantley is the Training Administrator for the U.S. Patent and Trademark Office’s Global Intellectual Property Academy. You can find out more about his personal work in open data, analytics, and related topics at BillBrantley.com. All opinions are his own and do not reflect the opinions of the USPTO or GSA.Edit