4.1 Data Quality

Garbage In, Garbage Out

Data quality is the foundation of good data analysis. This week we’ll explore how big data presents challenges to assuring quality data and what the consequences of poor data quality can be.

Read

  1. Hillman, Jane, Data quality and AI safety: 4 ways bad data affects AI and how to avoid it. Prolific. 2023. **Note: Prolific is a survey based application that claims to use ethical, careful techniques for selecting its pool of potential respondents to academic surveys. It has an agenda, but this article largely is an good overview of why quality data is important in AI environments.**

  2. Government Data Quality Hub, The Government Data Quality Framework. Data quality principles. December 2020. (Also read one of the case studies at the end of the section.)

Post

Address the following in the 4.1 Data Quality discussion board:

  • Poor data quality and data practices can lead to downstream bias that impacts individuals and decision-making. What examples from the readings (or your own reading) do you consider most harmful? Overblown?

  • Governments around the world are developing frameworks like the reading from the United Kingdom for improving data integrity and reducing bias. Based on your own experience as a user and/or analyst of data, what elements of the framework do you think are least understood or practiced by researchers/government agencies? How might they improve?

  • Find an example of a recent data quality problem in the news that resulted in bias from big data. What was the origin of the bias and did the media address how to prevent it? Describe what you find surprising about the problem. (Make sure to cite the article.)

Discussion posts are the primary assessment of your understanding and critical assessment of readings. You must reference the readings analyzed in your posts using in-text APA style. Posts should range between 400-500 words.

Due by: 10/15 11:59 pm EST