Home
Staff
Syllabus
Schedule summary this week
Class Tools Piazza Blackboard GitHub
|
Detailed Schedule
Notes:
- This schedule is subject to change and will be updated throughout the semester.
- Homework (HW) is individual work. All work must be your own. You may use resources on the Internet for reference, but you must not copy large sections of code and if you use online resources, you must cite your sources (including URL).
- Group work on HW assignments is not acceptable. Do not start with someone else's solution and make changes -- this is easy to detect.
- ODU Spring Academic Calendar, Exam Schedule
This Week
Week 1: Web Science, Web Architecture - Jan 14, 16
Due before Tuesday's class
- sign up for Piazza (if not already added)
- explore the course website
- if you do not already have an account at github.com, register for account with a username that incorporates your name (for example, my GitHub username is weiglemc)
- Reading:
Assignment
- HW0 (due Jan 21) - Course Setup
you must accept the invitation via GitHub Classroom to complete this assignment, see Piazza or Blackboard for the invite link
- HW1 (due Jan 28) - Web Science Intro
you must accept the invitation via GitHub Classroom to complete this assignment, see Piazza or Blackboard for the invite link
References
- COMP 4750 - Introduction to Web Science, Harding University, Dr. Frank McCown
- All things HTTP: https://cs531-f19.github.io, http://www.cs.odu.edu/~mln/teaching/cs595-s12/
- Information retrieval, metadata: https://phonedude.github.io/cs834-f16/
- Visualization, Analytics: https://www.cs.odu.edu/~mweigle/CS625-F19/, https://www.cs.odu.edu/~mweigle/CS725-S18/
- Web programming, LAMP: http://www.cs.odu.edu/~jbrunelle/cs518/, http://www.cs.odu.edu/~mkelly/semester/2015_spring/cs418/
- James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzner, "Web Science: An Interdisciplinary Approach to Understanding the Web", Communications of the ACM, July 2008, Vol. 51 No. 7, Pages 60--69
- Kieron O'Hara and Wendy Hall, "Web Science", ALT Online Newsletter, May 2008
- Andrei Broder et al., Graph structure in the Web, Computer Networks, June 2000, Vol 33, No 1-6, Pages 309--320, doi: 10.1016/S1389-1286(00)00083-9
- Google, "We knew the web was big...", July 25, 2008
- https://en.wikipedia.org/wiki/Web_crawler
- Bin He et al., "Accessing the Deep Web", Communications of the ACM, May 2007, Vol. 50 No. 5, Pages 94-101
- Dennis Fetterly, Mark Manasse, and Marc Najork, On the Evolution of Clusters of Near-Duplicate Web Pages, Journal of Web Engineering, October 2004, Vol 2, pp. 228-246
- Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly, Detecting Spam Web Pages Through Content Analysis, Proceedings of the 15th International World Wide Web Conference (WWW), May 2006
- Steve Lawrence and C. Lee Giles, Searching the World Wide Web, Science, Apr 1998, Vol. 280, Issue 5360, pp. 98-100.
- https://www.worldwidewebsize.com
- Web Science Trust, What is Web Science?, 2016, video (5:55)
- http://en.wikipedia.org/wiki/Internet
- https://developer.mozilla.org/en-US/docs/Web/HTTP
- http://en.wikipedia.org/wiki/Domain_Name_System
- http://en.wikipedia.org/wiki/DNS_cache_poisoning
- Mark Nottingham, RFC2616 is Dead, 2014
- Architecture of the World Wide Web, Volume One, W3C, 2004
- http://en.wikipedia.org/wiki/URI_scheme
- cURL man page
- GNU Wget 1.20 Manual
- RFC 7231 - Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content, 2014
- Michael Nelson, "REST, HATEOAS, and Follow Your Nose", WS-DL blog, Nov 2013.
Week 2: Introduction to Python - Jan 21, 23
Due before Tuesday's class
References
Week 3: Introduction to R - Jan 28, 30
Due before Tuesday's class
References
Week 4: Measuring and Archiving the Web - Feb 4, 6
Due before Tuesday's class
Assignment
Feb 4
- Week-04 Measure-Archive slides (1-65)
|
Feb 6
- Week-04 Measure-Archive slides (66-144))
|
References
- Feb 4: The Missing Semester of Your CS Education - covers all the topics we consider crucial to be an effective computer scientist and programmer (and things that often aren't directly taught in classes)
- Feb 4: Why Containers?, https://twitter.com/b0rk/status/1224500774450929664/photo/1 - related to HW2
- W3C, Web Characterization Terminology & Definitions Sheet, http://www.w3.org/1999/05/WCA-terms/
- Pitkow, Summary of WWW Characterizations, Journal of the World Wide Web, 1999
- O'Neill et al., Trends in the Evolution of the Public Web, D-Lib Magazine, Apr 2003
- Baeza-Yates et al., Characterization of national Web domains, ACM Trans. Internet Technol., May 2007,
- Fetterly et al., A large-scale study of the evolution of Web pages, Software Practice & Experience, 2004
- Ntoulas et al., What's new on the web?: The evolution of the web from a search engine perspective, Proc WWW 2004
- WS-DL's Celebration of 20 years of the Internet Archive, 2016, http://ws-dl.blogspot.com/2016/11/2016-11-21-ws-dl-celebration-of-ia20.html
- WARC tools
- warc-tools - https://github.com/internetarchive/warctools
- webrecorder - http://webrecorder.io/
- WARCreate & WAIL
- WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy, WS-DL blog post, July 2013
- Electric WAILs and Ham, WS-DL blog post, February 2017
- Replacing Heritrix with Chrome in WAIL, and the release of node-warc, node-cdxj, and Squidwarc, WS-DL blog post, July 2017
- Memento 101
- Carbon Dating the Web
- Memento Chrome Extension
- Mink for Google Chrome
Week 5: Searching the Web - Feb 11, 13
Due before Tuesday's class
Assignment
Feb 11
- Week-05 Searching slides (1-58)
|
Feb 13
|
References
- Levene (2010), An Introduction to Search Engines and Web Page Navigation
- Croft et al. (2010), Search Engines: Information Retrieval in Practice
- http://www.robotstxt.org/
- web crawler animations - see Table 3 of Smith and Nelson, Site Design Impact on Robots: An Examination of Search Engine Crawler Behavior at Deep and Wide Websites, 2003.
- https://en.wikipedia.org/wiki/Precision_and_recall
- http://en.wikipedia.org/wiki/Stop_words
- Zobel & Moffat (2006), Inverted files for text search engines, ACM Computing Surveys, 38(2),
- Index partitioning schemes, from Introduction to Information Retrieval
- Google bombs: http://en.wikipedia.org/wiki/Google_bomb, https://searchengineland.com/google-kills-bushs-miserable-failure-search-other-google-bombs-10363
- PageRank: http://en.wikipedia.org/wiki/PageRank
- Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 1999
- Google Webmaster Guidelines
- my chance to name-drop -- this page features a video explanation from my grad school friend, Matt Cutts, who was the web spam guru at Google for a while
Week 6: Social Networks - Feb 18, 20
Due before Tuesday's class
Assignment
Feb 18
- Week-06 Social Networks slides (1-83)
|
Feb 20
- Week-06 Social Networks slides (84-94)
|
References
- Chapter 2: Graphs (pdf) and Chapter 3: Strong and Weak Ties (pdf) from Networks, Crowds, and Markets: Reasoning About a Highly Connected World
- http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon
- Erdos number
- Anatomy of Facebook
- Feld, Why Your Friends Have More Friends Than You Do, American Journal of Sociology, Vol 96, No. 6, May 1991, pp. 1464--1477.
- http://en.wikipedia.org/wiki/Friendship_paradox
- Marlow et al., Maintained relationships on Facebook, 2009
- Huberman et al., Social networks that matter: Twitter under the microscope, First Monday, Jan 2009
- Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452-473, https://en.wikipedia.org/wiki/Zachary's_karate_club
- Girvan M. and Newman M. E. J., Community structure in social and biological networks, Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002), https://arxiv.org/abs/cond-mat/0112110
- http://en.wikipedia.org/wiki/Centrality
- Bibliography of Research on Social Network Sites, http://www.danah.org/researchBibs/sns.php
- Christakis and Fowler, Connected, 2009
Week 7: Selection and Social Influence - Feb 25, 27
Due before Tuesday's class
Assignment
- HW5 (due Thurs, Mar 19) - Graph Partitioning -- refers back to Week 6 material, with extra credit from Week 8
- now due Tues, Mar 24 at 11:59pm
References
- Chapter 4: Networks in Their Surrounding Contexts (pdf) from Networks, Crowds, and Markets: Reasoning About a Highly Connected World
- The hidden influence of social networks, (18:44), TED Talk by Nicholas Christaki, May 10, 2010
- Papachristos, Braga, Hureau, “Social Networks and the Risk of Gunshot Injury”, 2012, http://dx.doi.org/10.1007/s11524-012-9703-9
- Study: Odds Of Being Murdered Closely Tied To Social Networks, NPR, Nov 15, 2013
- Papachristos and Wildeman, “Network Exposure and Homicide Victimization in an African American Community”, 2013, http://dx.doi.org/10.2105/AJPH.2013.301441
- Schelling simulators
Week 8: Visualizing Social Networks - Mar 3, 5
Due before Tuesday's class
- HW4 - due Thu, Mar 5
- Reading:
Mar 3
- Week-08 Visualization slides
- Forced directed layout walk-through
|
Mar 5
|
References
- Brendan Griffin, "Graphs of Wikipedia: Influential Thinkers", https://www.brendangriffen.com/post/wikipedia-thinkers/
- Kate Starbird, “Information Wars: A Window into the Alternative Media Ecosystem”, https://medium.com/hci-design-at-uw/information-wars-a-window-into-the-alternative-media-ecosystem-a1347f32fd8f
- Brandes et al., 2013, Handbook of Graph Drawing and Visualization, Ch. 26 Social Networks (pdf)
- Freeman, 2000, Visualizing Social Networks, Journal of Social Structure
- more network visualization examples: http://flowingdata.com/category/visualization/network-visualization/
- graph data formats, https://gephi.org/users/supported-graph-formats/
- https://en.wikipedia.org/wiki/Force-directed_graph_drawing
- Stand-Alone Software
- Python libraries
- JavaScript-based libraries
- More D3
- Force-Directed Layouts in D3
NO CLASS - SPRING BREAK - Mar 10, 12
NO CLASS - SPRING BREAK (extended due to COVID-19) - Mar 17, 19
https://www.odu.edu/emergency/news/2020/2/novel_coronavirus_co/update-4
ONLINE INSTRUCTION BEGINS
Week 9: Collective Intelligence and Recommender Systems - Mar 24, 26
Due Tuesday
Assignment
Mar 24
- All videos accessible via Media Gallery in Blackboard
- Intro to Online (2:59)
- part 1 - Collective Intelligence - slides 1-27 (16:51)
- part 2 - Intro to Recommender Systems - slides 28-36 (2:18)
- part 3 - Recommending a Movie - slides 37-53 (17:04)
- part 4 - Challenges for Collab Filtering - slides 55-64 (4:48)
- part 5 - HW6 intro (3:41)
|
Mar 26
|
References
- Toby Segaran, Programming Collective Intelligence, Chs 1-2
- Quinn and Bederson, “Human computation: A survey and taxonomy of a growing field”, CHI 2011
- Google Flu Trends, archived at http://web.archive.org/web/20150107200557/https://www.google.org/flutrends/about/how.html
- Coronavirus Google Trends
- Google Trends
- Jeff Howe, "The Rise of Crowdsourcing", Wired, Jun 2006
- Google's New Street View Image Recognition Algorithm Can Beat Most Captchas, techcrunch, 2014
- How Google Cracked House Number Identification in Street View, 2014
- Google ImageLabeler
- FoldIt
- Parameswaran and Whinston, Social Computing: An Overview, CAIS 19:37, 2007
- Fayyad, Piatetsky-Shapiro, and Smyth, Knowledge Discovery and Data Mining: Towards a Unifying Framework, Proc. KDD, 1996
- Bollen et al., Twitter mood predicts the stock market, 2011
- Zeynep Tufekci, YouTube's Recommendation Algorithm Has a Dark Side, The Atlantic, April 1, 2019
- Pearson's r in Python, http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html,
- Other similarity measures: Cosine similarity, Jaccard coefficient, Manhattan (taxicab) distance
- Lam and Riedl, "Shilling Recommender Systems for Fun and Profit", WWW 2004
- Hidden Industry Dupes Social Media Users, 2011
- Buskirk, How the Netflix Prize Was Won, Wired, 2009
- Narayanan and Shmatikov, Robust De-anonymization of Large Sparse Datasets, 2008
- Bruce Schneier, Why Anonymous Data Sometimes Isn't, 2007
Week 10: Clustering Algorithms - Mar 31, Apr 2
Due Tuesday
Assignment
Mar 31
|
Apr 2
- Q&A via Zoom
(see Apr 1 Piazza post for Zoom information)
|
References
Week 11: Document Filtering (Classification) - Apr 7, 9
Due Tuesday
Due Thursday
Assignment
Apr 7
- All videos accessible via Media Gallery in Blackboard
- part 1 - Intro to Classifiers, slides 1-14 (6:53)
- part 2 - Classifiers and Probabilities, slides 15-28 (10:33)
- part 3 - Bayesian Classifier, slides 29-42 (9:17)
- part 4 - Implementing a Bayesian Classifier (23:28)
- part 5 - HW8 intro (3:14)
|
Apr 9
- Q&A via Zoom
(see Apr 1 Piazza post for Zoom information)
|
References
Week 12: kNN and Algorithm Summary - Apr 14, 16
Due Tuesday
Due Thursday
Assignment
Apr 14
- All videos accessible via Media Gallery in Blackboard
- part 1 - kNN, slides 1-18 (13:54)
- part 2 - Validating and Optimizing kNN, slides 19-32 (15:05)
- part 3 - Algorithm Summary, slides 33-52 (10:50)
- part 4 - HW9 intro (3:28)
|
Apr 16
- Q&A via Zoom
(see Apr 1 Piazza post for Zoom information)
|
References
Week 13: Disinformation - Apr 21, 23
Due Tuesday
Due Thursday
Assignment
Apr 21
|
Apr 23
- Q&A via Zoom
(see Apr 1 Piazza post for Zoom information)
|
References
- Cherilyn Ireton and Julie Posetti, Journalism, 'Fake News' and Disinformation: A Handbook for Journalism Education and Training, UNESCO, 2018, https://en.unesco.org/fightfakenews
- Fake News examples
- 2019 IPR Disinformation in Society Report, https://instituteforpr.org/ipr-disinformation-study/
- “These Fake Local News Sites Have Confused People For Years. We Found Out Who Created Them”, BuzzFeed News, Feb 2020
- “Exposing the “‘pink slime’ journalism” of Journatic”
- Fake Local News
- Local Memory Project
- Manipulating Social Media
- “Inside the hate factory: how Facebook fuels far-right profit”, The Guardian, Dec 5, 2019
- “I Found Election Interference And No One Cared”: One US Veteran’s Fight To Protect His Compatriots Online, BuzzFeed News, Dec 30, 2019
- “Sanders supporters have weaponized Facebook to spread angry memes about his Democratic rivals”, The Washington Post, Jan 24, 2020,
- “How conservatives learned to wield power inside Facebook”, The Washington Post, Feb 20, 2020
- Woolley and Guilbeault, “Computational Propaganda in the United States of America: Manufacturing Consensus Online”
- "A white nationalist created a hoax about gun confiscation that is leading to calls for violence on social media and message boards", Media Matters, Dec 2019
- Paul and Matthews, “The Russian ‘Firehose of Falsehood’ Propaganda Model: Why It Might Work and Options to Counter It”, The Rand Corporation, 2016,
- Wilson and Starbird, “Cross-platform disinformation campaigns: lessons learned and next steps”, Harvard Misinformation Review, Jan 14, 2020
- “Putin’s Long War Against American Science”, NY Times, Apr 13, 2020
- "How Russia’s Troll Farm Is Changing Tactics Before the Fall Election", NY Times, Mar 29, 2020
- “Social media hosted a lot of fake health news this year. Here's what went most viral.”, NBC News, Dec 29, 2019
- “Firehosing: the systemic strategy that anti-vaxxers are using to spread misinformation”, The Guardian, Nov 2019
- “How Coronavirus Fears Have Amplified a Baseless But Dangerous 5G Conspiracy Theory”, Time, Apr 9, 2020
- "Facebook to warn users who 'liked' coronavirus hoaxes", Associated Press report, Apr 16, 2020
- “What We Pretend to Know About the Coronavirus Could Kill Us”, NY Times, Apr 3, 2020
- Armchair Epidemiology
- Starbird, “How a Crisis Researcher Makes Sense of Covid-19 Misinformation”, Mar 9, 2020
- Shu and Liu, Detecting Fake News on Social Media, https://github.com/mdepak/fake-news-detection-resources
- Starbird, “Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter”, ICWSM 2017, blog summary: “Information Wars: A Window into the Alternative Media Ecosystem”, March 2017
- Starbird et al. “Ecosystem or Echo-System? Exploring Content Sharing across Alternative Media Domains”, ICWSM 2018, blog summary: “Content Sharing within the Alternative Media Echo-System: The Case of the White Helmets”, May 2018,
- Tracking Coronavirus Misinformation, https://www.newsguardtech.com/coronavirus-misinformation-tracking-center/
|