1、Data Sources and Conversion Feeding the GIS.,Like a teenager, a GIS can consume more than data you ever imagined! Discussion here focuses more on projects than organization-wide implementation. Often, data collection is an end in itself. Almost invariably, its the costliest element of any project an
2、d of most organizational implementations- 80%.,4. Design A Process for Obtaining and Converting Data from Source,-identify source (document, map, digital file, etc) for each and every entity and its attributes -defining the procedures for converting data from source and into the database,RECAP from
3、Implementation Steps/db design,We will talk tonight primarily about sources,In practice, identifying data sources and developing a conversion strategy is interwoven with the conceptual and physical data design process.,RECAP from db design,.some steps/tasks in the process,Identifying data internal a
4、nd external sources checking for completeness and quality new data via field or aerial surveys Fixing problems in the data source map scrubbing coding source documents with unique IDs Converting to digital form scanning or digitizing raster to vector conversion strategy entry of attribute data Data
5、conversion specifications horizontal and vertical control projection coordinate system accuracy requirements,Document flow control monitor flow of maps, documents and digital files thru conversion process change control for changes to data that occur during this time period Quality control procedure
6、s potentially highly complex errors will occur generally a combo of automated and manual procedures requires comparing digital version to original source and checking internal consistency problem resolution process and correction responsibilities need to be defined Final acceptance criteria criteria
7、 data must meet before final loading into database,RECAP from db design,FIELD DATA,DIGITIZE,SCANNING,RECAP from db design,Where do I get data? & What form is it in?,Where? Secondary: existing data already published/available special tabulation/contract Administrative records: data as by-product with
8、in your organization other organizations Primary data: from scratch developed in-house (DIY) contracted out (field work is always slow and expensive!) What format? machine readable (digital) hardcopy (paper, maps),Spatial data in digital form is the most valuable since this is generally the most exp
9、ensive to obtain.,Dont forget to look in-house!,collected by your organization as data by-product of normal agency operations acquired for some other project Dont forget to look, especially if its a large organization. There may already be a GIS project in existence or about to be launched!,Major GI
10、S Data Sources,Maps Drawings (sketch or engineering) Aerial (or other) Photographs Satellite Imagery CAD data bases Government & commercial spatial (GIS) data bases Government & commercial attribute data bases Paper records and documents,Pre-processing and Conversion: almost invariably required!,Map
11、s and Drawings digitizing, or scanning than raster to vector conversion Aerial Photographs photogrammetry/photo interpretation to extract features digitizing or scanning to convert to digital rectification and DTM (digital terrain model) to create digital orthos Satellite Imagery rectification and D
12、TM to create digital orthos (if desired) CAD Data Bases translator software (pre-existing or custom-written) needed to convert to required GIS format,GIS Data Bases conversion between proprietary standards (ARC/INFO, Intergraph, AutoCAD, etc.) Spatial Data Transfer Standard Attribute Databases geoco
13、ding if micro data conversion between geographic units (e.g. zip codes and census tracts) conversion between different databases Records and Documents OCR (optical character recognition) scanning keyboarding then, same as attribute data bases,Data Conversions: general comments,Paper Maps to Digital
14、generally the most complex & expensive automated extraction of layers problemmatic and error prone requires scanning then raster to vector conversion digitizing may be freehand with tablet, or “heads-up” on screen Digital to Digital Conversions Safe Softwares Feature Manipulation Engine (FME) produc
15、t provides translation between different vendors GIS formats (now ESRIs Data Interoperability Extension) spreadsheet software (Excel) is a powerful beginning point for converting to required database format (e.g. to .dbf for ArcView) specialized conversion packages for converting between different d
16、atabases also available e.g. DBMS/Copy Plus, Data Junction efforts at standardization, which reduces need for conversions, have had limited success cos of competitive pressures FGDCs, Spatial Data Transfer Standard (SDTS), is a federal standard Open GIS Consortium, a vendor and user group, lobbies f
17、or standards and non-proprietary approaches to GIS database creation,Data Conversion: hints on the process,NEVER CONVERT ON THE ORIGINAL FILE ALWAYS A COPY. ALWAYS convert in an unrelated sub-directory Document each new file that is made in the conversion process. Archive the original files on a rea
18、dily available media Automate as many processes as possible Projections Many like files Replication of data for output,Record all your steps while converting data formats, in a journal or notebook. You WILL use that same conversion sometime in the future,Data Sources: Table of Contents,Overview Fede
19、ral Data Sources: Spatial Data Federal & Non-profit Data Sources: Attribute data Private Sector Data Resources: Spatial and Attribute Selected Sources in Detail DIME TIGER USGS: Overview DEM detail DLG Detail DOQs and DLGs Digital Chart of the World Shuttle Radar Topography Mission (SRTM) NAVSTAR: g
20、ps Remote Sensing US Census Bureau Attribute Data Primary Data Collection: Some Issues,Guides and sources for GIS data include: cast.uark.edu/local/hunt/index.html www.geospatial- www.geospatial- For others see: www.utdallas.edu/briggs/other_gis.html,Federal Data Sources: Spatial Data,Federal Data
21、 Agencies: USGS (Geological Survey, National Mapping Div.-Interior) all kinds of mapping, not just geology! NGS (National Geodetic Service- Commerce, part of NOAA) geodetic surveying Ordnance Survey (in U.K.) combines both functions. Federal Mission Agencies USDA (Agriculture) Resource Conservation
22、Service (formerly Soil Conservation Service) US Forestry Service Interior US Fish and Wildlife: wetlands Bureau of Land Management Environmental Protection Agency TRI (toxic release inventory) sites,DoD (Defense) National Geospatial-Intelligence Agency (NIMA) formerly National Imagery and Mapping Ag
23、ency (NIMA) originally Defense Mapping Agency (DMA) US and world terrain mappings NAVSTAR: gps satellites US Army Corp. of Eng.: flood control NASA (National Aeronautics and Space Administration LANDSAT satellites Commerce Census Bureau: DIME & TIGER files NOAA (National Oceanic and Atmospheric Admi
24、nistration) AVHRR (Advanced Very High Resolution Radiometer) weather satellites,Federal & Non-profit Data Sources: Attribute data,Federal Data Agencies CB (Census Bureau- Dept of Commerce) population and industry data from surveys BEA (Bureau of Economic Analysis- Dept. of Commerce) STAT-US: nationa
25、l accounts Federal Mission Agencies Most federal agencies now have a stat. dept Bureau of Labor Statistics National Center for Health Statistics National Center for Education Statistics National Center for Criminal Justice Statistics National Center for Transportation Statistics Interstate Commerce
26、Commission Internal Revenue Service,Non-profit interest groups: Urban and Regional Information Systems Association (URISA) National League of Cities Population Reference Bureau Transportation Assoc. of America Trade Associations: American Public Transit Assoc. see Encyclopedia of Associations Trade
27、Publications Progressive Grocer see Business Periodicals Index University Research Centers University of Michigan, National Institute for Social Research,Private Sector Data Resources,Spatial data GIS software vendors e.g. ArcData Catalog Satellite Data Sellers e.g. Space Imaging Inc. See Remote Sen
28、sing slides for list Topological data (street networks and boundaries) TeleAtlas (European, bought out Etak) DeLorme Geographic Data Technology (Absorbed and disbanded Wessex. Now owned by RL Polk) Navtech: in-vehicle navigation system data Maptech: Navigation charts Environmental Earthinfo Hydrosph
29、ere Meteorlogix Aerial Surveying/ Engineers/Consultants For primary data: legions of them,Attribute Data Wide array of companies and services. pollsters and market surveyors remarketeers/updaters of federal gov. data (census data, TIGER files, etc) data aggregators: collect admin. data from state an
30、d local gov. (e.g. building permits) gap fillers in government offerings Larger providers include: Claritas (National Planning Data Corporation,SMI/Donnelly) Equifax/National Decision Systems ESRI BIS (Business Information Solutions) formerly CACI Marketing Services E Specialized providers include:
31、Dun and Bradstreet (company finances) InfoUSA (business yellow pages) TRW-REDI (property data),Vector Data Implementations: DIME file (Dual Independent Map Encoding),introduced for the 1970 US Census and used again in 1980; replaced by TIGER in 1990 pioneering early example of topological structure
32、basic record was a line segmentflat file structure with all info in one record (Star and Estes misleading) segments defined between every intersection for all linear features in landscape (streets, railroads, etc)each segment record contained items such as: segment ID Segment type from node ID to no
33、de ID from node x,y to node x,y address range left address range right city left city right tract left tract right other left/right polygon ID info as needed e.g. county, block, prepared only for metroplitan areas (278 files covering about 2% of nation) some cities (very few) maintained and expanded
34、 (e.g add zoning) them after Census inconsistent with Metroplitan Map Series paper maps published for each census very compute intensive to process into continuous streets or polygons,Vector Data Implementation: TIGER File (Topologically Integrated Geographic Encoding and Referencing file),introduce
35、d for 1990 Census to eliminate inconsistencies between census products cover entire country, and released by county include hydrography, roads, railroads, etc. uses relational data base model data derived from 3 sources: scanned USGS 1:100,000 Map Series addresses ranges from DIME file, originally u
36、pdated to 1986/7 geographic area relationship files used by CB to process 1980 census problems with TIGER accuracy limited by USGS base map and processing (100m horizontal) one time only; many segments missing. many local gov. records better data only: requires software to process. First version was
37、 Tiger/1992 Latest is TIGER/Line 1998, issued July, 1999,comprises 6 record types (tables) basic data record (type 1): line segment records similar to DIME file shape coordinates (type 2): extra coords to define curved line segments area codes (type 3): block records giving higher order geog (tract,
38、 city, etc) feature name index (type 4): line segment records with code for alternative names (used when a segment has two or more charateristics (e.g both Main St and US 66) feature name list (type 5): names associated with codes n Type 4 special addresses ranges (type 6): additional address ranges
39、 (e.g if zip code boundary splits a line segment Minor differences exist in layout of various versions of TIGER which can lead to reading problems,Vector/Raster Data Implementation: USGS (United States Geological Survey Digital Data),Digital Elevation Model (DEM) data and new (2000) National Elevati
40、on Dataset (NED) Raster elevation data available at 30m, 2 arc second, and 3 arc second spacing (1 sec. of lat 100ft) Digital Line Graph Data (DLG) data digital representations of the cartographic line info. on main USGS map series. National Hydrography Dataset Combines water data from DLG with EPAs
41、 Reach File Version 3 Plans to update both through cooperative projects with local gov. agencies National Land Cover Dataset (NLCD)/Land Use and Land Cover (LULC) data NLCD (release started 2000) updates LULC data of 1970/1980 NLCD: 30 meter resolution, 21 landuse categories, derived from mid 1990s
42、Landsat-7 Geographic Name Information System (GNIS) Data standardized place names and feature classification Digital Orthoquads (DOQ) and Digital Raster Graphs (DRG) raster data DOQ: 1 meter resolution digital orthophotos for entire US (if locals cooperated!) DRG: scanned USGS 7.5 minute quads Distr
43、ibution of digital data by USGS began in the early 1980s. For details on early data see:USGS National Mapping Program USGS Digital Cartographic Data Standards, Washington, D.C.: Geological Survey Circular 895A thru G, 1983.,USGS: Elevation Data Detail (Digital Elevation Model and National Elevation
44、Dataset),DEM raster elevation data. 7.5 minute, 1:24,000 USGS quads (15 minutes in Alaska) elevations at 30 meter spacing UTM coords, NAD27 datumaccuarcy: 15m RMSE (some 7) (horizontal: 15m) 30 minute, 1:100,000 USGS topo sheet 2 arc second spacing NAD27 datum accuracy: 5-25m-1/2 map contour int. (h
45、orizontal: 50m) 1 by 2 degree, 1:250,000 USGS sheets from Defense Mapping Agency (DMA) 3 arc second spacing WGS72 datum variable: 30-75m (horizontal: 100m),National Elevation Dataset (late 2000 availability) Derived from earlier 7.5 and 30 DEM data sources Seamless US coverage with consistent Datum:
46、 NAD83 Projection: geographic (lat/long) Units: meters Spacing: 1 arc second (approx. 30 meters or 100 ft)2-arc second for Alaska(interpolation used if source at lower res.)Each file has three records: Record A: descriptive information Record B: elevation data Record C: accuracy statistics Files cla
47、ssified into one of three levels depending on editing, etc Level 1: raw elevation data; only gross blunders corrected. Level 2: data edited and smoothed for consistency. Level 3: data modified for consistency with planimetric data such as hydrography and trans. Data has gaps, overlaps, holes and art
48、ifacts, hence need for NED,USGS DLG Data Detail (Digital Line Graph),Three products: Large Scale (ls) - generally 1:24,000 7.5 minutes per file Medium Scale (ms) - 1:100,000 30x30 minute files (half a map sheet) Small Scale (ss) -1:2,000,000 21 files for nation (one CD-ROM) Three formats: Standard (
49、no longer available) internal cartesian coords (saves storage) limited topological info; Optional (DLG-3) (use for GIS): UTM metric (Albers Equal Area Polyconic for small scale) full topological info Graphic (small scale only) GS-CAM compatible; no topological info. OK for display,Layers (up to 9) H
50、ydrography: all flowing and standing water, and wetlands Hypsography: contours and elevation Transportation: roads, trails, railroads, pipelines, transmission lines Boundaries: political & administrative Public Land Survey System (PLSS): township, range, section (not ss) Vegetative surfaces (ls only) Non-veg surfaces (e.g. sand) (ls) survey control and markers (ls) manmade features (e.g. buildings)(ls) Horizontal Accuracy: large scale (7.5min.): 12-50m medium (1:100,000): 50m small : ?,