CS 495/595 Introduction to Web Science Fall 2014 http://www.cs.odu.edu/~mln/teaching/cs595-f14/ Assignment #11 Due: 11:59pm Dec 11 2014 NOTE: Assignment #11 is for extra credit only; you do not have to do this assignment if you do not want to. Each question is worth up to 3 points (for a total of 6 possible points). Support your answer: include all relevant discussion, assumptions, examples, etc. 1. Using the data from A9: - Consider each row in the blog-term matrix as a 500 dimension vector, corresponding to a blog. - From chapter 8, replace numpredict.euclidean() with cosine as the distance metric. In other words, you'll be computing the cosine between vectors of 500 dimensions. - Use knnestimate() to compute the nearest neighbors for both: http://f-measure.blogspot.com/ http://ws-dl.blogspot.com/ for k={1,2,5,10,20}. 2. Rerun A10, Q2 but this time using LIBSVM. If you have n categories, you'll have to run it n times. For example, if you're classifying music and have the categories: metal, electronic, ambient, folk, hip-hop, pop you'll have to classify things as: metal / not-metal electronic / not-electronic ambient / not-ambient etc. Use the 500 term vectors describing each blog as the features, and your mannally assigned classifications as the true values. Use 10-fold cross-validation (as per slide 46, which shows 4-fold cross-validation) and report the percentage correct for each of your categories.