From CS 725/825 Spring 2018

CS725-S18: Homework 2

Contents (hide)

Due: January 24, 2018 before 9:30am

The goal of this week's assignment is to gain experience using OpenRefine for data cleaning. Later in the semester, you will work on assignments using real-world data. This data will be messy. Learning how to use this tool now will save you a ton of time later in the semester.

Setup

Create your project

Tasks

Install

Download and install OpenRefine

Tutorial

Work through the tutorial at http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial through "Export Data". Put the answers to the questions I ask below in your project README.md.

Exercise

The last part of the tutorial is the section "More Data Sets - Is the 27 Club Real?". Use OpenRefine to determine how many musicians in the dataset died at age 27. Only use OpenRefine for this -- creating a chart in Excel or something else is not necessary. Export your final data file as CSV and add it to your Gitlab project.

In your project README.md, explain the steps you took to clean and analyze the data to reach your conclusion.

Important: Your write-up this section is the most important part of this assignment. You need to include enough detail so that I am convinced that you understand how to use OpenRefine. In addition, you will lose points if there are many spelling or grammatical errors and if your write-up does not use appropriate Markdown markup for clarity and neatness.

Submission

Submit the URL of your solution Gitlab project in Blackboard

Retrieved from http://www.cs.odu.edu/~mweigle/CS725-S18/HW2
Page last modified on January 11, 2018, at 02:58 PM