CS 795/895 - Information Visualization
Fall 2012: Tues/Thurs 3-4:15pm, E&CS 2120

Print - Search - Admin

Announcements

Staff

Schedule

Course Topics

Syllabus

Project

Papers

Links

H W5-voterdata

Here's a bit more information on the layout of the voter data (voter-data.tar.gz).

There are 6 CSV files, with the naming convention RLNNNNERM.csv, where

  • RLNNNN = Record Linkage Date (0709 or 0829)
  • ER = Total % error introduced in data
  • M = Multiple of ER for % error introduced in duplicates.

Examples:

  • RL0709000.csv - Dataset with no error
  • RL0829103.csv - Dataset with 10% total error and 30% error in duplicates (Hence M=3)

Description of Columns:

  1. lname: Last name
  2. fname: First name
  3. race: Race (A, B, I, M, O, U, W or missing)
  4. gender: (M, F or missing)
  5. house_num: House number
  6. street: Street name for house
  7. strt_tp: Street type (e.g., Ave, Ct, St .)
  8. reg_num: Voter registration number
  9. dob: Date of Birth
  10. mi: Middle name initial
  11. reg_dt: Voter registration date
  12. unit_num: House unit number (Apt no.)
  13. suffix: Name suffix (Jr., Sr., III .)
  14. mname: Middle name, if available
  15. lsound: Soundex code for Last name
  16. fsound: Soundex code for First name

Sample data:

 lname	fname	race gender  house_num  street	 strt_tp  reg_num  dob     mi   reg_dt   unit_num  suffix   mname   lsound  fsound
 ALPHIN	LINDA	W    F	     2510	STAFFORD AVE	  161090   2/11/43 B	19680101	            BEST    A415    L530
 CHEEK	DELORES	B    F	     2314       VAN DYKE AVE	  202640   5/18/49 A	19720101		    A       C200    D462
 DEVINE	JOSEPH	W    M	     616	BROOKS   AVE	  2203321  5/18/43 D	19720101	   JR       D       D150    J210