1、OAIster: Whats with the Weird Name?,Kat Hagedorn UM Library Information Technology November 28, 2005,What is OAIster?,Is/was a means for UM to test the OAI protocol (hence the name) A method for sharing metadata among institutions and groups of people A means of developing a search service for end-u
2、sers worldwide,Basics of OAI,What does OAIster collect?,Harvests all metadata from all OAI data providers (within reason) Only keeps metadata that points to digital objects, e.g., articles, photographs, datasets, etc. in digitized form All available via search service,Searching OAIster,Time to show
3、off OAIster http:/www.oaister.org/,A little history,Service is now 3.5 years old Started with 66 data providers and a little over 200K records Now have 572 data providers and “a little” over 6 million records 37% US, 63% international,Visibility of OAI,Surprising who hasnt made their metadata sharea
4、ble through OAI Harvard, Yale, Stanfordthe big ones Initially perplexing, but now clearer: always done at the end only recently thought of at initiation of projects truthfully, many institutions not collaborative,Examples of data providers,Many data providers are huge, e.g., arXiv: physics preprint
5、and postprint articles pubmed: medical articles, although restricted pictureaustralia: images from govt and academic institutions in Australia lcoa: Library of Congress digital archives usc: U South California census data,Examples of data providers,Most are small, though Many around 100 records Valu
6、e of making their records available increased visibility inclusion in bigger search service than theirs incorporation in Yahoo! Search,Yahoo! Search,Two years ago, collaborated with team at Yahoo! Search to send our metadata to them for indexing e.g., “gardens at albury” in Yahoo! Search know its no
7、t static html robotingIspartOf Victorian Railways collection. Many, many more hits Also send metadata to Google,System design,UM harvester,Record storage,XSLT transformation tool,BibClass indexes,OAI-enabled DC records,Non-OAI-enabled DC records,XSL stylesheets (per source type),Search interface (XP
8、AT),Transformation of metadata,Most metadata needs to be brushed off adding an http:/ to the front of URLs Or raked removing instances of !CDATA Or wrung out instead of “Wheres Waldo,” its “Wheres the incorrect UTF-8 character?” And should be normalized,Why normalize?,Sample date values2-12-01 2002-
9、01-01 0000-00-00 1822 between 1827 and 1833 18-? November 13, 1947 SEP 1958 235 bce Summer, 1948,Why use a CV?,Sample subject values30,51,52 1852, Apr. 22. Everitt Judson, letter to Philuta Judson. Slavery-United States-Controversial literature view of interior with John Henry sculpture Particles (N
10、uclear physics) - Research.,Best practices,Fixing more than half of the data providers is cumbersome Individuals at OAI-enabled institutions started a “Best Practices” group to inform data providers what they ought to do http:/oai-m.nsdl.org/cgi-bin/wiki.pl?TableOfContents,2nd phase OAI,“Best Practi
11、ces” group sponsored by the Digital Library Federation, which also Sponsors our latest grant Better and more easily calculated statistics Search interface improvements Clustering / classification techniques Using richer metadata,Clustering / classification,Using automated means to take a selection o
12、f metadata and determine “what its about” Working with Emory University (one of our grant partners) to test their tool Results will be integrated into search so can search in smaller group of OAIster records,Using richer metadata,Data providers must use simple Dublin Core Very sparse schema for desc
13、ribing objects dc:title must contain main title, sorted title and alternative titles dc:subject doesnt distinguish between geographical, hierarchical, temporal,Using richer metadata,Encouraging use of richer metadata, especially MODS (Metadata Object Description Schema) from LOC Developed testbed fo
14、r grant deliverables currently only shows MODS work http:/www.hti.umich.edu/m/mods/,Other stuff,Well, make it smaller somehow Clean up Boolean interface squinch fields together include more normalization Make it available through federated search Proselytize sharing metadata Test, test, test,Contact me,Kat Hagedorn UM Library Information Technology khageumich.edu www.oaister.org,