The backbone of this CD is a series of DBase (DBF) files, each containing the point data for a
single element in a set of solid (sediment) samples from the NURE HSSR program. All of the images
and map coverages on the CD are derived from these DBF files. This section outlines the steps
used in creating these files. The starting point for data processing on this CD is the set of quadrangle-by-quadrangle DBF files
of NURE HSSR data found in Hoffman and Buttleman (1996). Note that these files are not the raw
NURE data, but are themselves processed from the original digital files (on tape) produced by DOE.
Indeed, the DOE tapes are also not the true raw data from the program, as there was a manual
data-processing step to transfer data from paper reports. 308 quadrangle files (covering the
continental U.S.) from Hoffman and Buttleman (1996) contained data for stream, lake, or spring
sediments, and a subset of 43 of these files also contained data for soils (Table 1). qqqRecords
covering these sample media were selected for inclusion in this CD.
Most of the selection of records from the original DBF files, and other primary data extraction
tasks were done with the Paradox database program. The steps in this procedure were as follows:
Records were extracted from the quadrangle DBF files for the appropriate
sample media using one or more of the field codes listed in Table 2. (See Hoffman and Buttleman,
1994, for explanation of codes.) After surveying each file (through a series of Paradox queries),
a new query was constructed that extracted all records for stream sediments (wet and dry),
lake and pond sediments (including dry lakes), spring sediments, and soils.
Data fields were chosen from the selected records for further processing.
These included several label fields, the sample-type fields listed in Table 2, the geographic
coordinates, fields for the 54 chemical elements appropriate for solid samples (Ag, Al, As, Au, B,
Ba, Be, Bi, Ca, Cd, Ce, Co, Cr, Cs, Cu, Dy, Eu, Fe, Hf, Ho, K, La, Li, Lu, Mg, Mn, Mo, Na, Nb, Nd,
Ni, P, Pb, Pt, Rb, Ru, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Th, Ti, U, V, W, Y, Yb, Zn, Zr), and 5
miscellaneous fields that contain chemical data (CONCN01 through CONCN05). A Paradox query extracted
these fields, and all other data were discarded (including things like stream characteristics,
contamination codes, various labels, and fields not used for solid sample media).
Most chemical data in the quadrangle DBF files are stored in parts-per-billion (ppb).
Paradox was used to convert each field into a more appropriate unit: parts-per-million (ppm) for
trace elements, and wt.% for major elements (Al, Ca, Fe, K, Mg, and Na).
Many samples were analyzed by more than one laboratory, or by more than one
method. In these cases, there are multiple records in the quadrangle DBF files for an individual
sample location, each with analyses for different elements. These records were found and combined
into a single record. Paradox was used to sort the records by latitude and longitude. A temporary
DBF file was generated, and read by a DOS FORTRAN program, ECLEAN, written by the author (unpublished).
This program searched for consecutive records that had identical or nearly identical geographic
coordinates (within 0.0005 degrees, or ~50 m, of each other). These were assumed to be the same
sample, as round-off errors sometimes affected the 4th decimal place. ECLEAN then combined these
records, element by element, into a single new record. In the few cases where data for the same
element was present in two or more records, the highest value was arbitrarily chosen. This process
also had the effect of consolidating samples actually collected as duplicates at a single location
into single records. ECLEAN also eliminated records with no chemical data (and there were many of
these). The program then created a new DBF file with the consolidated data.
At the beginning of this processing stage, the 308 original quadrangle DBF files have been reduced to
308 new DBF files containing only the geographic and chemical-element fields of the sediment and soil data, without any duplicate or blank
records. Major systematic problems, as discussed above, have been corrected. The following processing
steps were used to find and correct additional problems in the datasets, to search for regional inconsistencies
in the data, and to establish the usefulness of data reported as upper limits (e.g.,<10 ppm).
The reduced DBF files were surveyed with a DOS FORTRAN program, also written by the author, called
GRIDPLOT. This program reads in multiple DBF files, and produces a simple, color, gridded map of the
data for one element on the computer screen. It is extremely efficient, and allows the rapid visualization
of the data (all 308 files can be read, and a plot generated on a 200 MHz PentiumPro PC in about 1
minute). Systematic errors that were not found during primary data processing can be seen visually, as
discontinuities in the colored map. In some cases, these could be traced to systematic errors in the
quadrangle DBF files, especially errors in the position of decimal points. These were corrected by
repeating the primary processing for the affected quadrangle. Other discontinuities are caused by analytical
errors, and are handled in step 2.2.
In some areas, generally in the western U.S., one or more quadrangles, or parts of quadrangles, would appear
to be discontinuous with adjacent quadrangles for a given element, when viewed with GRIDPLOT. In many
such instances, a good case can be made that there is a systematic analytical error (i.e., an accuracy
problem, probably due to different analytical methods or interlaboratory calibration problems) across the
discontinuity. The best argument for the occurrence of this type of error is that regional chemical
trends are seen on both sides of the discontinuity, and the application of a simple correction factor can make the
data appear continuous. In these cases, a correction factor is supplied to GRIDPLOT for the affected areas,
and the factor is adjusted until the gridded map appears smooth and continuous. Such corrections can be
displayed graphically in ArcView, by examining the
Data Processing themes for each element (see below). In other cases, either no correction factor can
correct the discontinuity, or regional trends are absent in certain quadrangles and the data appear to
be random. Such data were deleted from this CD, and the Data Processing theme will show a correction
factor of zero (see, as a good example, the hafnium data in ArcView).
A negative concentration of an element in the quadrangle DBF files indicates that the value is
an upper limit (e.g., -10 implies <10 ). These values present a special problem in creating map
coverages of geochemical data. The philosophy adopted here is a simple one: steps are taken to ensure
that all upper limits fall within the lowest interval in the final
map legend, and thus are known to be correctly categorized. First, two histograms are prepared for each
element, one showing the concentration range of unqualified data, the other showing only upper limits
For most elements, the vast majority of the data fall in the first histogram, and markers are inserted
into this plot showing the values of every 5th percentile (for reference). The second histogram is displayed
below the first, and compared visually. The strategy is to select a cutoff value below which upper limits
are to be retained, such that they do not affect the accuracy of the map. Above this cutoff, upper limits are deleted from the final
dataset. In the case shown in Fig. 1, it would be possible to construct maps using a color legend that has as its lowest interval the
lowest 5th percentile of the data. Upper limits with values of <2 ppm fall unambiguously within this lowest color interval, and can
be merged into the final dataset without affecting the appearance or accuracy of the map; in practice, the < is dropped, and the
value multiplied by 0.5. However, those upper limits with values of <6 ppm could have real values (had they be measured more
precisely) that fall anywhere within the lowest 30% of the concentration distribution. Such values cannot be assigned with certainty
to the correct color interval in the map legend, and are simply deleted. The graphical result of deletions of this type may be small holes
in the map where grid cells could not be assigned real values. Table 3 shows the values of these cutoffs for each of the elements
compiled on this CD.
Once the data are leveled, upper limit cutoffs are established, and areas of bad data are identified,
the GRIDPLOT program is run again to utilize its secondary function, which is to extract values for a single element from all 308
processed quadrangle DBF files. For the special case of uranium, GRIDPLOT was programmed to make choices about which
data field to use for the final value. Uranium is typically stored in one of five fields in the original quadrangle DBF files: one labeled
as CONU , the others as CONCN01, CONCN02, CONCN05, and CONUDN. The CONC05 field was given priority
over the CONU field if both were filled, and data in the CONCN01 and CONCN02 fields were used in the absence of data in the
first two fields. The CONUDN field (U by delayed neutron) was only coded in few percent of the samples (in only 9 quadrangles),
but these data were not used here. The output from this data processing step is a series of elemental DBF files of useable NURE
Several major errors in the NURE HSSR data were identified and corrected during the above data-processing steps. These errors are
present in the original DBF files and composite database of Hoffman and Buttleman (1994; 1996). The errors will be corrected in the
a new database (Smith, 1998), but as of this time only a small part of the United States is covered by this.
The data survey conducted for each quadrangle DBF file in step 1.1 uncovered a block of stream-sediment
samples miscoded as stream water in seven quadrangles in the northeastern U.S. (Boston, Glen Falls, Lake Champlain, Lewiston, Newark
, Scranton, and Williamsport). These records were altered to give them the correct coding prior to any data processing.
In ~30,000 samples collected and analyzed by Oak Ridge Gaseous Diffusion Plant (ORGDP)
and tabulated in the quadrangle DBF files, major elements (Al, Ca, Fe, K, Mg, and Na) plus As and Se were all tabulated incorrectly
in units other than ppb. Over 70 quadrangles contain data affected by this problem. These records can be identified from the lack of
coding in the SAMPTYP field, and a value of 4 coded in the SAMPMDC field. These problems were corrected as a group.
A group of ~15,000 records found in several dozen quadrangles in the western U.S. (samples
analyzed but not collected by ORGDP) also contain major element data in ppm instead of ppb, although trace elements are all coded
correctly. Most of these are coded as soils (SAMPTYP=59), talus (SAMPTYP=62), or uncoded in this field (SAMPTYP=blank),
and all have a value of M coded in the LTYPC field, which stands for sediment. These were also corrected by special handling.
Themes with names of the form Grid: Cu are elemental concentration maps, produced from a gridded
version of the point data. These bitmap files (Tiff) are based on grids made with the MINC program of Webring (1981), which
employs a minimum curvature interpolation of the point data to create a smooth surface. The grid-cells used were 2 km on each side.
Following the gridding operation, the program GCLR (unpublished, by R. W. Simpson, USGS, Menlo Park, Calif.) was used to
produce a color-shaded relief map. The color-scheme of these maps is similar to that used in the point-data themes, as it is based
upon the distribution of the underlying point data. Here, seven intervals are used, corresponding to the lowest 40th, the 40th-80th,
the 80th-90th, the 90th-95th, the 95th-98th, the 98th-99th, and the 99th-100th percentiles. The legends for all these maps, showing
the actual concentration values corresponding to each color interval, are shown in a special view called Gridded Elemental Map Legends.
Gridded elemental map legends. This view contains an image showing the concentrations of each element corresponding to each color interval in the gridded elemental maps.
Taking just the left side of this legend as an example: the gridded elemental map shown in the Arsenic Geochemistry view has its dark blue color corresponding to <2.4 ppm As, light blues
representing 2.4 to 5.2 ppm, greens representing 5.2 to 7.6 ppm As, etc., up to magenta representing >22 ppm As. Note that because
the grid is actually a shaded-relief rendition of the data, each color grades somewhat from high saturation (left side of color bar) to low saturation (right side of color bar).
Contact_Person: Jeffrey N. Grossman
Contact_Organization: U.S. Geological Survey
Address_Type: mailing and physical address
Address: 12201 Sunrise Valley Dr.