Here's a bit more information on the layout of the voter data (voter-data.tar.gz).
There are 6 CSV files, with the naming convention RLNNNNERM.csv, where
RLNNNN = Record Linkage Date (0709 or 0829)
ER = Total % error introduced in data
M = Multiple of ER for % error introduced in duplicates.
Examples:
RL0709000.csv - Dataset with no error
RL0829103.csv - Dataset with 10% total error and 30% error in duplicates (Hence M=3)
Description of Columns:
lname: Last name
fname: First name
race: Race (A, B, I, M, O, U, W or missing)
gender: (M, F or missing)
house_num: House number
street: Street name for house
strt_tp: Street type (e.g., Ave, Ct, St .)
reg_num: Voter registration number
dob: Date of Birth
mi: Middle name initial
reg_dt: Voter registration date
unit_num: House unit number (Apt no.)
suffix: Name suffix (Jr., Sr., III .)
mname: Middle name, if available
lsound: Soundex code for Last name
fsound: Soundex code for First name
Sample data:
lname fname race gender house_num street strt_tp reg_num dob mi reg_dt unit_num suffix mname lsound fsound
ALPHIN LINDA W F 2510 STAFFORD AVE 161090 2/11/43 B 19680101 BEST A415 L530
CHEEK DELORES B F 2314 VAN DYKE AVE 202640 5/18/49 A 19720101 A C200 D462
DEVINE JOSEPH W M 616 BROOKS AVE 2203321 5/18/43 D 19720101 JR D D150 J210