1.2 Big Data and You

Find Your Inner Data Geek

Now that you know the characteristics of big data and some of its applications to the public sector, it’s time to consider how it serves your field of interest. Whether you plan to be a data scientist or work with a team of them, you need to understand how big data is collected, analyzed, and communicated. This module starts with an overview of data science to learn the steps in the life cycle of a big data project. It ends with you developing a plan to earn a data certification that exposes you to the tools and techniques of stages in the data science life cycle that fit your skill needs.

Read

  1. The Data Science Process: A Visual Guide to Standard Procedures in Data Science by Chanin Nantasenamat

  2. Hilary: the most poisoned baby name in US history by Hilary Parker

Watch

  1. How I Would Learn Data Science (If I Had to Start Over) by Ken Jee

Complete

This week you’ll spend most of your time creating a plan to earn a data certification by the end of the semester. This assignment requires that you identify a research question and develop a plan to complete workshops over the semester that will expose you to the data science knowledge and skills needed to develop a big data project. By the end of the semester, you’ll submit a proposal that describes how you can answer the research question using big data and data science techniques.

You may be a novice to big data terms and sources at this point in the semester, but you probably have questions in your field that can be explored with data. For example, you may be interested in pollution levels of major cities and what conditions lead to poor air quality days. While structured data like measures of ozone levels, particulate matter, and nitrogen oxides can be used to identify poor air quality, big data generated from the sources of pollutants can be used to predict poor air quality days.  Vehicles, traffic lights, navigation apps, satellites, etc. generate high velocity, high volume, and unstructured data that can be harnessed to predict poor air quality days. With these data, you can explore questions like, “What impact could limiting semi-trucks driving on I-75 between 6am and 6pm have on air quality in Atlanta?” The purpose of the data certification is to learn about the types of big data and analysis techniques needed to answer the question that you formulate from your field of interest.

Formulating a good research question takes time and is iterative as you learn about types of data and techniques. This exercise gets you started on crafting a question and finding tutorials that can help you develop a project plan to answer it. It requires that you identify five, 90 minute tutorials that can tailor your learning about big data to answer the research question that you develop. To complete the assignment for this week:

  • Read the instructions for the data certification plan.

  • If you need some inspiration, Google “big data and X (your field).” It’s also a good idea to look at some academic journals in your field to see how the techniques of data science are being applied.

  • Write me an email or set up a time to talk if you need help.

  • Submit your data certification plan to corresponding assignment folder.

Due by: 9/3 at 11:59pm