- Meta searching: historical perspective (archie, veronica, etc), issues
- Current trends: commercial systems (Yahoo, Lycos, etc.) robots, STARTS, Lyceum
- Ming-Hokng Maa, Sandra L. Esler and Michael L. Nelson, "Lyceum:
A Multi-Protocol Digital Library Gateway," NASA TM-112871, July 1997.
This week we look at metasearching, robots and directories. The common
requirement for these applications is that the information to be indexed
and served to the user is "out there" and may not be known a priori.
Lyceum is a proof-of-concept meta-DL constructed by searching
individual DLs nodes. The nodes in Lyceum are of different protocols,
and Lyceum performs conversion of the queries to the protocols of the
target DLs. Lyceum does "HTML-scraping" to present the search results to
- L. Gravano, C.-C. K. Chang, H. Garcia-Molina, A. Paepcke, "STARTS:
Stanford Proposal for Internet Meta-Searching," Proc. of the 1997 ACM
SIGMOD International Conference On Management of Data, 1997.
STARTS approaches metasearching by defining a interoperability protocol to
be implemented by the different search engines. STARTS defines the
mechanism by which proxies can query indices (of differing protocols) and
have enough standard meta-information to filter, rank, and display the
- C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F.
Schwartz, and Duane P. Wessels. "Harvest: A Scalable, Customizable
Discovery and Access System. Technical Report CU-CS-732-94, Department of
Computer Science, University of Colorado, Boulder, August 1994 (revised
If you don't want to do protocol conversion for different indices,
or you cannot rely on the indices to comply with a protocol such as
STARTS, then for some applications it is reasonable to gather the remote
information yourself. The architecture of most commercial systems
(Altavista, Lycos, etc.) is proprietary information, but Harvest is a
freely available and popular system for gathering and serving remote
information that incorporates all the general components of its commercial
brethren. It has a clean, modular design and has the ability to
hierarchically arrange different Harvest servers.