CS 745 / 845 - Introduction to Digital Libraries Mid-Term Exam Michael L. Nelson Due: 3/19/98 Answer 6 of the 10 questions. The mid term is worth 30% of your grade, so each question will count 5 points. Contact me if you need clarification on some of the questions. I prefer soft-copy submissions, ASCII or PS. Hard copy will do in a pinch. Use the readings, class notes, other materials, and me, but not each other. I estimate that each question will take 30-40 minutes a piece, and probably no less than 1/2 page each. References to support your claims (where appropriate) are helpful. You are not limited to the readings we have covered in class. 1. What is a digital library (DL)? How is a DL different from an SQL database? from traditional text retrieval systems? from the World Wide Web (WWW)? How is a DL different from a traditional library? 2. Give an account of the economic situation that may force many scholarly journals, especially the more esoteric ones, to go to strictly digital format. That is, what is likely to cause the publishers to cease production of hard copy journals. 3. Discuss the implications of a distribued DL architecture, where each authoring institution can be their own publisher of STI. What new functions and capabilities could be possible? What current functions and capabilities would be lost? Are peer reviewed journals still possible under this system? What are the implications for central STI organizations? 4. What are some of the benefits of SGML in publishing? What are some of the drawbacks? Under what circumstances is SGML use convenient and desirable? Under what circumstances is SGML not convenient or even possible? 5. Discuss precision and recall and the tradeoff between them. Briefly discuss inverted files, signature files, and flat files, the space/time tradeoffs involved when using these data structures to for indexing. 6. Discuss some of the salient features of the Kahn/Wilensky framework. Pick 2 different DLs that we have discussed in class that are sufficiently different from each other and map their features and functions to concepts in the KWF where applicable. Note any features in the KWF that are not present or defined in the DLs. 7. Give a brief discussion of Dienst and why it is significant. Discuss what the services do. Discuss the significant changes from the Dienst server 4.0 to 5.0. 8. Compare and contrast handles with bibcodes within a DL application. Provide scenarios where each has utility over the other. Provide 2 real, live, working URLs: 1 using a handle, and 1 using a bibcode. 9. Summarize the functional highlights of buckets. What role do they play? Compare and contrast "traditional" archives with "bucket-oriented" archives. Are there applications for which buckets might not be well suited? 10. Consider the 2 following records: Record 1: "Compare and contrast handles with bibcodes within a DL application. Provide scenarios where each has utility over the other. Provide 2 real, live, working URLs: 1 using a handle, and 1 using a bibcode." Record 2: "Summarize the functional highlights of buckets. What role do they play? Compare and contrast "traditional" archives with "bucket-oriented" archives. Are there applications for which buckets might not be well suited?" For both records, build an inverted file using stemming and frequency count. Suggest and use a reasonable stop-list (emphasizing brevity is acceptable, list only the parts of the stoplist that are needed). Also for both records, list the frequency weighted vectors for each record. Rank the following queries (Boolean operations are not supported): query 1: "a bucket of cfd applications" query 2: "archived data and archived buckets"