Web Indexing, Metadata, Interoperability and Ontologies

1. Web Indexing

A web index is a browsable tool used to systematically search individual websites or collections of web pages to quickly locate information. A sitemap is a common example. Web indexing creates a conceptual map of the document's content, aiding both searchers in finding relevant materials and authors in identifying inconsistencies in topic treatment.

Types of Web Indexes

  • Hyperlinked A-Z Indexes: Similar to a back-of-the-book index, but arranged alphabetically on a webpage where entries are hyperlinked directly to named anchor tags or resource sections. It is highly user-friendly and helps overcome spelling variations.
  • Meta-tag Keyword Indexing: Uses the HTML header <meta> tag to store page descriptions and keywords. Search engine robots (spiders/crawlers like Googlebot or Yahoo's Slurp) extract these metadata tags to automatically index the page while preserving the search context.
  • Search Engine Optimisation (SEO): Webmasters adjust keywords to improve page ranking and visibility in search results.
    • White Hat SEO: Follows search engine guidelines to promote accessibility.
    • Black Hat SEO: Uses unethical practices like "Spamdexing" (stuffing invisible keywords) or "Cloaking" to deceive search algorithms.
  • Taxonomies/Categories: A subject-based classification that displays a hierarchical structure of disciplines, grouping like terms together for easy discovery.
  • Thesauri: A highly structured controlled vocabulary (like AGROVOC) that connects terms hierarchically and conceptually using Broader Terms (BT), Narrower Terms (NT), Related Terms (RT), and Synonyms (SN).

2. Metadata

Metadata is "data about an object." In digital ecosystems, it reflects the properties of a resource (e.g., Title, Author, Format). A universally recognized schema over the Internet is the Dublin Core Metadata Initiative (DCMI), which comprises 15 core elements.

Types of Metadata

  • Administrative Metadata: Valid throughout the document's life cycle. Includes details on ordering, acquisition, licensing, rights, and digital provenance.
  • Technical Metadata: Stores information on the file type, byte structure, file size, and the specifications required for rendering or playing the digital file.
  • Structural Metadata: Explains the different components of an object, outlining sections, sub-sections, and their relationships to map the document's structure.
  • Descriptive Metadata: Used to describe and locate the document. Stores bibliographic data like title, author, and publisher. AACR2 and MARC21 are prime examples.
  • Preservation Metadata: Essential for digital preservation to track and secure a document's original features against physical deterioration, technological shifts, and media transfers over time.

3. Ontologies

Ontology studies the existence of entities and the relationships formed by grouping them based on similar characteristics or attributes, typically depicted through hierarchies.

  • Web Ontology: Due to HTML's limitations in representing complex concepts, Web Ontology uses XML paired with the Resource Description Framework (RDF). It utilizes standard vocabularies like RDF Schema (RDFS) and Web Ontology Language (OWL) to describe web page relationships.
  • Generic Ontologies: "Umbrella" ontologies covering broad domains of knowledge. Created from classification schemes or thesauri, they define large-scale concepts and facilitate interoperability among more specific, domain-level ontologies.
  • Task-Oriented Ontology: Designed for specific problem domains, built structurally at the lexical level and above.

4. Interoperability and Protocols

Given that web resources utilize disparate metadata standards and formats, interoperability introduces a cross-platform mechanism allowing users to query multiple distributed systems with a single search effort.

Key Protocols

  • Z39.50 and ZING: One of the oldest and most widely accepted protocols in library services. Originally developed to search distinct OPACs (Online Public Access Catalogues), it facilitates real-time information retrieval using a standard client-server architecture (e.g., querying a Zebra server via a Yaz client).
  • OAI-PMH: The Open Archives Initiative Protocol for Metadata Harvesting is used primarily to collect metadata from different digital repositories and archives into a centralized searchable database.
  • SRW/U: Search/Retrieve via the Web or URL, bridging the gap between traditional Z39.50 systems and modern web-based querying.