Substantive review of metadata

The metadata should enable other people to use the data. What merely passes mp is not necessarily of sufficient quality or usability for end users. A good metadata review involves examination of the data, and our goal is to improve the integrity and quality of the research results produced by USGS for the Nation.

Good metadata is good scientific writing, lively and engaging. It should never be stilted or overly complex.

Key metadata elements to examine

Here is an annotated list of elements that reviewers should pay special attention to, because they are used by software that carries out further processing of the metadata, or because they are sometimes entered incorrectly or awkwardly.

Title

The single most important text in the metadata record is the title given in the Citation_Information of the Citation element. It is the main thing people will see in the Science Data Catalog, data.gov, and in web pages created from the metadata. So it should be something like "(measurement) of (phenomenon) in (geographic feature) at (geographic location)".

The Title must never be a file name or a UUID.

Links

URLs must work, and they must point to a pertinent destination or document. Some metadata catalogs will reject records with links that fail.

Assume links are case sensitive--get the letter case right. Windows systems are not case sensitive, Web servers usually are.

Be on the lookout for links that are not complete, that are awaiting information such as a publication number or data release identifier.

Larger_Work_Citation

When datasets were integral to publications, it was common for the Citation to indicate the data that the metadata describe, and the Larger_Work_Citation would specify the publication to which those data were attached. With the new data release policy, it is more appropriate for the Larger_Work_Citation to point to a collection of data of which these data are a part, and publications referring to the data can be specified using Cross_Reference sections. The arrangement of information in these elements should be logical, but it may be constrained by the conventions adopted by a project, science center, or program.

Larger_Work_Citation sections can be nested ad infinitum. There should be a really good reason if you see more than two levels of Citation_Information within the Citation.

Abstract

Both the Science Data Catalog and data.gov now show a brief summary of the metadata first, including the first few hundred characters of the Abstract. So it's going to look better and help end users more if the first part of the Abstract is a tweet-sized paragraph that gives a succinct, plain-language summary of the dataset. After that short paragraph, the text of the Abstract may include details of the data collection effort that, in years past, would have been written in a publication.

Don't let the abstract begin with the mission statement of the project, program, science center, or the USGS as a whole.

Purpose

First and foremost, this should say what we wanted to do with the data. What caused us to make the data the way we did? Why do the data have the level of detail they do? Map scale, if there is one, should inform the statement of purpose.

The purpose of these data should be more specific than the mission statement of the project, program, science center, or the USGS as a whole.

Spatial_Domain

In rare cases, data may describe things that are not pertinent to a location on Earth. For example, analysis of chemicals that were synthesized in a laboratory do not describe conditions anywhere on the earth. Similarly, measurements of standard materials used to calibrate instruments do not have meaningful geospatial reference. However the Spatial_Domain is required by the standard and must be provided in order for the metadata to be valid. In these rare cases we recommend you indicate a global extent (West_Bounding_Coordinate -180, East_Bounding_Coordinate 180, North_Bounding_Coordinate 90, South_Bounding_Coordinate -90) AND add to the Supplemental_Information a statement like "These data have no geospatial reference, so the bounding coordinates given are for the whole earth."

No projected coordinates--must be latitude and longitude as decimal degrees.

Logical_Consistency_Report

This should specify any reasons why any parts of the data set aren't directly comparable to other parts of the data set.

This should be about more than geometrical topology.

Completeness_Report

What's missing? Where? How much or what proportion of information is missing? In a database, how many values of each field are NULL or empty? On a map, which parts of the overall map extent were not mapped?

Be alert for statements that don't say anything useful.

Overview_Description

Yes it's free text. But it should make sense to somebody who isn't a specialist in these data, and it should explain where the data are, what the components of the data package are, and what the data mean.

Entity_and_Attribute_Detail_Citation is a good place for a URL to a methods report.

Attribute_Definition

One of the most critical elements in the entire standard, these values should be written clearly and correctly. While units of measure and the meaning of special "no data" values can be explained in other elements, it is helpful to include that information in the definition of the attribute as well.

This should not be the same as the Attribute_Label. It should be possible to make a clearer statement.

Attribute_Domain_Values

Look carefully to ensure that the right type of description was written.
  • Enumerated_Domain is for explaining abbreviations and special values like "no data" values
  • Range_Domain is for measurements and other numerical attributes
  • Codeset_Domain is for abbreviations that were defined by somebody else and published on the web.
  • Unrepresentable_Domain can be used to simply explain the values that don't fit one of the other types of descriptions. For example unabbreviated place names, descriptive text, sample identifiers without intrinsic scientific significance.

Distributor

Ideally an organizational contact rather than a person. This section isn't as crucial as it was before the web. If a science center is the distributor, use the center's switchboard number rather than a single person's number.

Resource_Description

Standard says this is the identifier by which the distributor knows the data set. It's a name or number, not category or type. This will be a good place to put the digital object identifier.

Some software writes "Downloadable data". That is incorrect.

Distribution_Liability

Text here should not conflict with the FSP's Guidance on Disclaimer Statements Allowed in USGS Science Information Products.

Be alert for text that demands more of our users than is permissible.

Format_Name

This should specify the format of the data files, not the format of the package containing them. So "zip file" or "gzipped tar file" are not data formats, they're package formats.

We are working to provide controlled vocabulary for this field, but our work is not yet complete.

Network_Resource_Name

Wherever possible, this should be a link to a file and not a link to a directory. At the moment, the data.gov system wants only one Network_Resource_Name to be given in each Digital_Form. That is contrary to the Standard, and we will argue against that restriction. However downstream processing of the metadata is likely to be smoother if there is only one URL specified in each Digital_Form.

Test the links to make sure they work. Be alert to incomplete URLs awaiting information.

Metadata_Date

This needs to be kept current. It is used by the Science Data Catalog to determine whether the copy of this record in their collection needs to be updated.

Remember that in CSDGM all dates are written YYYYMMDD, like 20160811. No dashes, no slashes.

Metadata_Review_Date, Metadata_Future_Review_Date

These are about the process by which a given office maintains, reviews, and updates metadata, and aren't particularly important for explaining the data to potential users. We recommend that you not include these elements in your metadata.