U.S. Geological Survey

Organizing USGS information with consistent vocabulary

Peter N. Schweitzer (U.S. Geological Survey, Reston VA 20192)

This presentation is included as part of the short course "Designing information for the worldwide web" presented at USGS Publications 2003

Audience:

People engaged in the USGS publication process who are learning how to design information for the web. They will have learned the conceptual method of Information Mapping, Inc., including principles of relevance, chunking, and labeling.

Goal:

This presentation is intended to enhance peoples' understanding of labeling by relating it to the design and use of controlled vocabularies in catalogs and indexes within USGS.

Contents

  1. Understand indexing with controlled vocabularies
    1. What are controlled vocabularies?

      Consistent collection of terms chosen for specific purposes with explicitly stated, logical constraints on their intended meanings and relationships. Includes classifications, formal thesauri, topic maps, subject headings, authority files, gazetteers, and ontologies.

    2. Why indexing isn't the same thing as labeling

      Labels shown on web pages need to be quickly recognizable with minimal additional study by the user; index terms specify how information chunks are related to other chunks and to other index terms. Labels might need to identify combinations of concepts that are conceptually distinct.

      Artificial example:
      Label: Fish in Green Lake
      Index terms: fish, lakes, lake fish, Green Lake
      
      Example from USGS web:
      http://toxics.usgs.gov/topics/attenuation.html
      Labels:
        Toxics Program Projects on Natural Attenuation
          science programs
            and
          contamination and pollution
            and
          biodegradation
            and
          contaminant transport
            and
          geochemical processes
        Headlines
          news services
        Natural Attenuation Remediation Related Activities
          contamination and pollution
            and
          remediation
        Fact Sheets
          reports
        Bibliographies
          bibliographies
      
    3. How well-constructed vocabularies help label relevant chunks well

      Labels can be drawn from controlled vocabularies to increase consistency and familiarity. By recognizing in your web site the concepts represented in the vocabulary, you may be less likely to present those concepts in a confusing way.

  2. How controlled vocabularies fit into the USGS enterprise web
    1. Design of the USGS thesaurus
      1. Concepts, not words, constitute terms
        1. Arguments about specific terms go on forever
        2. Instead, focus on concepts
      2. Term relationships (BT, NT, RT, UF, SN)
        BTBroader Term
        NTNarrower Term

        Terms in hierarchy must have an is a relationship

        NT is a type of BT
        NT is a part of BT
        NT is an instance of BT
        
        RTRelated Term
        • We do not describe the nature of the relationship beyond the notion "you might be interested in this"
        • Ontologies and topic maps specify the nature of these relationships in detail and focus on them
        UFUse For (relates non-preferred to preferred terms).

        Not synonymy, but put in the same bucket for search purposes

        Non-preferred terms are also called lead-in terms.
        SNScope Note explains our use of the term, may include definition

        Examples

        geologic contacts
          SN:   Plane or irregular surface between two types
                or ages of rock; examples are faults, intrusive
                borders, bedding planes separating distinct
                strata, and unconformities. [Glossary of Geology, 4th ed.]
          BT:   stratigraphic sections
          NT:   unconformities
          RT:   stratigraphy
          UF:   contacts (geologic)
        
        geologic history
          SN:   Record (and inferred reconstruction) of the origin
                and development of the Earth since its formation.
          BT:   Earth characteristics
          NT:   biostratigraphy
                Earth history
                lithostratigraphy
          RT:   geologic time scales
                geology
                paleontology
                paleoseismology
                stratigraphy
          UF:   chronostratigraphy
                geohistory
        
        ecological processes
          SN:   Dynamic biogeochemical interactions that occur
                among and between biotic and abiotic components
                of the biosphere.
          BT:   biological and physical processes
          NT:   algal blooms
                bioaccumulation
                biogeochemical cycling
                biological productivity
                contaminant transport
                dispersal (organisms)
                ecological competition
                ecosystem functions
                eutrophication
                extinction and extirpation
                habitat alteration
                migration (organisms)
                pollination
                succession (biological)
          RT:   ecology
                population and community ecology
          UF:   environmental processes
          UF+:  ecological models
        
      3. Scope: broad and shallow
        1. Covers topics to which USGS contributes useful information directly
        2. Includes a few topics of collateral interest to our users if information on those topics is supplied by our partners
    2. Catalog as an application of the thesaurus
      1. Scope and design
        1. Important resources not part of publications warehouse
          Web pages dealing with
          • subjects
          • geographic areas
          • programs and projects
          • organizational units
          • software
          • services (IMS, database search)
        2. Head page of a document indexed, not lower pages
        3. Bibliographic information, index terms, description, contact
        4. Implemented in relational database
      2. Purposes
        1. Test the thesaurus as an indexing tool
          • Find concepts not represented in thesaurus
          • Clarify ambiguous terms
          • Identify lead-in terms
        2. Provide a way to find information by topic, science, method, place
          • Pick terms from list rather than type in
          • Show scope and context for each term
          • Show documents matching given term or more specific terms
      3. Relationship to pubs warehouse
        1. Catalog demonstrates hierarchical browse strategy
        2. Includes some non-pubs resources
    3. Metadata within HTML documents
      1. USGS web guidelines specify elements

        http://internal.usgs.gov/guidelines/usgs.webguide.html#meta

      2. Search engines may or may not see
        • Most commonly seen: description, keywords
        • USGS search engine can see more (customized by USGS)
        • External search systems don't know our thesaurus
      3. HTML syntax cannot show term relationships

        <meta name="keywords" content="ecological processes">

        No place to put BT, NT, RT, UF
      4. Consistency with catalog?
        1. Catalog can generate meta tags
        2. Harvest meta tags--possible
        3. Different purposes for terms: categorize vs characterize
          Example: Natural attenuation
    4. Use of other vocabularies
      1. Facets not included in thesaurus:
        1. Place names
        2. Geographic feature types
        3. Geologic time
        4. Stratigraphic names
      2. More details for included facets:
        1. Rock types and mineral names
        2. Biological taxonomy
      3. Authority lists:
        1. Publication series names
        2. Organizations
        3. People


This page is <http://geo-nsdi.er.usgs.gov/talk/thesaurus/outline.html>
Maintained by Peter Schweitzer
Last updated 24-Feb-2004