Chapter 12. Extra di PostGIS

Table of Contents

Questo capitolo documenta le funzioni presenti nella cartella extras dei tarball e del repository dei sorgenti di PostGIS. Queste funzioni non sono sempre incluse nelle versioni binarie di PostGIS, ma di solito sono basate su PL/pgSQL o su script di shell standard che possono essere eseguiti come tali.

12.1. Standardizzatore di indirizzi

Si tratta di un fork del PAGC standardizer (il codice originale per questa parte era PAGC PostgreSQL Address Standardizer).

Il normalizzatore di indirizzi è un parser di indirizzi a riga singola che prende un indirizzo in ingresso e lo normalizza in base a un insieme di regole memorizzate in una tabella e in tabelle helper lex e gaz.

Il codice è integrato in un'unica libreria di estensioni PostgreSQL chiamata address_standardizer che può essere installata con CREATE EXTENSION address_standardizer;. Oltre all'estensione address_standardizer, è stata creata un'estensione di esempio per i dati chiamata address_standardizer_data_us che contiene tabelle gaz, lex e regole per i dati degli Stati Uniti. Questa estensione può essere installata tramite: CREATE EXTENSION address_standardizer_data_us;

Il codice di questa estensione si trova in PostGIS extensions/address_standardizer ed è attualmente autonomo.

Per le istruzioni di installazione consultare: Section 2.3, “Installazione e utilizzo dello standardizzatore di indirizzi”.

12.1.1. Come funziona il parser

Il parser lavora da destra a sinistra, esaminando prima i macroelementi per il codice postale, lo stato/provincia, la città, quindi esamina i microelementi per determinare se si tratta di un numero civico, di una strada, di un incrocio o di un punto di riferimento. Attualmente non cerca il codice o il nome del paese, ma questo potrebbe essere introdotto in futuro.

Codice paese

Si presume che sia USA o CA in base a: codice postale come USA o Canada stato/provincia come USA o Canada altro USA

Codice postale

Questi vengono riconosciuti usando espressioni regolari compatibili con Perl. Queste regex sono attualmente contenute in parseaddress-api.c e sono relativamente semplici da modificare se necessario.

Stato/provincia

Questi vengono riconosciuti usando espressioni regolari compatibili con Perl. Queste regex sono attualmente nel file parseaddress-api.c, ma in futuro potrebbero essere spostate in includes per facilitare la manutenzione.

12.1.2. Tipi di standardizzatori di indirizzi

Abstract

Questa sezione contiene una lista dei tipi di dato PostgreSQL installati dalla estensione Address Standardizer. Nota che sono descritti i comportamenti di conversione di questi tipi di dato, informazione molto importante nella progettazione di funzioni proprie.

  • stdaddr — Un tipo composito che consiste negli elementi di un indirizzo. È il tipo di ritorno della funzione standardize_address.

12.1.3. Tabelle dello standardizzatore degli indirizzi

Abstract

Questa sezione elenca i formati delle tabelle PostgreSQL utilizzate da address_standardizer per normalizzare gli indirizzi. Si noti che non è necessario che queste tabelle abbiano lo stesso nome di quelle a cui si fa riferimento qui. È possibile avere tabelle lex, gaz e rules diverse per ogni Paese, ad esempio, o per il proprio geocoder personalizzato. I nomi di queste tabelle vengono passati nelle funzioni di normalizzazione degli indirizzi.

L'estensione confezionata address_standardizer_data_us contiene i dati per la standardizzazione degli indirizzi statunitensi.

  • rules table — La tabella delle regole contiene un insieme di regole che mappano i token della sequenza di input dell'indirizzo in una sequenza di output standardizzata. Una regola è definita come un insieme di token di ingresso seguiti da -1 (terminatore) seguito da un insieme di token di uscita seguiti da -1 seguito da un numero che denota il tipo di regola seguito da una classificazione della regola.
  • lex table — A lex table is used to classify alphanumeric input and associate that input with (a) input tokens ( See the section called “Gettoni di ingresso”) and (b) standardized representations.
  • gaz table — A gaz table is used to standardize place names and associate that input with (a) input tokens ( See the section called “Gettoni di ingresso”) and (b) standardized representations.

12.1.4. Address Standardizer Functions

12.2. Tiger Geocoder

Abstract

A plpgsql based geocoder written to work with the TIGER (Topologically Integrated Geographic Encoding and Referencing system ) / Line and Master Address database export released by the US Census Bureau.

There are four components to the geocoder: the data loader functions, the address normalizer, the address geocoder, and the reverse geocoder.

Although it is designed specifically for the US, a lot of the concepts and functions are applicable and can be adapted to work with other country address and road networks.

The script builds a schema called tiger to house all the tiger related functions, reusable lookup data such as road type prefixes, suffixes, states, various control tables for managing data load, and skeleton base tables from which all the tiger loaded tables inherit from.

Another schema called tiger_data is also created which houses all the census data for each state that the loader downloads from Census site and loads into the database. In the current model, each set of state tables is prefixed with the state code e.g ma_addr, ma_edges etc with constraints to enforce only that state data. Each of these tables inherits from the tables addr, faces, edges, etc located in the tiger schema.

All the geocode functions only reference the base tables, so there is no requirement that the data schema be called tiger_data or that data can't be further partitioned into other schemas -- e.g a different schema for each state, as long as all the tables inherit from the tables in the tiger schema.

For instructions on how to enable the extension in your database and also to load data using it, refer to Section 2.4.1, “Tiger Geocoder Abilitazione del database PostGIS”.

[Note]

If you are using tiger geocoder (tiger_2010), you can upgrade the scripts using the accompanying upgrade_geocoder.bat / .sh scripts in extras/tiger. One major change between tiger_2010 and tiger_2011+ is that the county and state tables are no longer broken out by state. If you have data from tiger_2010 and want to replace with tiger_2015, refer to Section 2.4.4, “Aggiornamento dell'installazione e dei dati di Tiger Geocoder”

[Note]

New in PostGIS 2.2.0 release is support for Tiger 2015 data and inclusion of Address Standardizer as part of PostGIS.

New in PostGIS 2.1.0 release is ability to install tiger geocoder with PostgreSQL extension model if you are running PostgreSQL 9.1+. Refer to Section 2.4.1, “Tiger Geocoder Abilitazione del database PostGIS” for details.

The Pagc_Normalize_Address function as a drop in replacement for in-built Normalize_Address. Refer to Section 2.3, “Installazione e utilizzo dello standardizzatore di indirizzi” for compile and installation instructions.

Design:

The goal of this project is to build a fully functional geocoder that can process an arbitrary United States address string and using normalized TIGER census data, produce a point geometry and rating reflecting the location of the given address and likeliness of the location. The higher the rating number the worse the result.

The reverse_geocode function, introduced in PostGIS 2.0.0 is useful for deriving the street address and cross streets of a GPS location.

The geocoder should be simple for anyone familiar with PostGIS to install and use, and should be easily installable and usable on all platforms supported by PostGIS.

It should be robust enough to function properly despite formatting and spelling errors.

It should be extensible enough to be used with future data updates, or alternate data sources with a minimum of coding changes.

[Note]

The tiger schema must be added to the database search path for the functions to work properly.

  • Drop_Indexes_Generate_Script — Generates a script that drops all non-primary key and non-unique indexes on tiger schema and user specified schema. Defaults schema to tiger_data if no schema is specified.
  • Drop_Nation_Tables_Generate_Script — Generates a script that drops all tables in the specified schema that start with county_all, state_all or state code followed by county or state.
  • Drop_State_Tables_Generate_Script — Generates a script that drops all tables in the specified schema that are prefixed with the state abbreviation. Defaults schema to tiger_data if no schema is specified.
  • Geocode — Takes in an address as a string (or other normalized address) and outputs a set of possible locations which include a point geometry in NAD 83 long lat, a normalized address for each, and the rating. The lower the rating the more likely the match. Results are sorted by lowest rating first. Can optionally pass in maximum results, defaults to 10, and restrict_region (defaults to NULL)
  • Geocode_Intersection — Takes in 2 streets that intersect and a state, city, zip, and outputs a set of possible locations on the first cross street that is at the intersection, also includes a geomout as the point location in NAD 83 long lat, a normalized_address (addy) for each location, and the rating. The lower the rating the more likely the match. Results are sorted by lowest rating first. Can optionally pass in maximum results, defaults to 10. Uses Tiger data (edges, faces, addr), PostgreSQL fuzzy string matching (soundex, levenshtein).
  • Get_Geocode_Setting — Returns value of specific setting stored in tiger.geocode_settings table.
  • Get_Tract — Returns census tract or field from tract table of where the geometry is located. Default to returning short name of tract.
  • Install_Missing_Indexes — Finds all tables with key columns used in geocoder joins and filter conditions that are missing used indexes on those columns and will add them.
  • Loader_Generate_Census_Script — Generates a shell script for the specified platform for the specified states that will download Tiger census state tract, bg, and tabblocks data tables, stage and load into tiger_data schema. Each state script is returned as a separate record.
  • Loader_Generate_Script — Generates a shell script for the specified platform for the specified states that will download Tiger data, stage and load into tiger_data schema. Each state script is returned as a separate record. Latest version supports Tiger 2010 structural changes and also loads census tract, block groups, and blocks tables.
  • Loader_Generate_Nation_Script — Generates a shell script for the specified platform that loads in the county and state lookup tables.
  • Missing_Indexes_Generate_Script — Finds all tables with key columns used in geocoder joins that are missing indexes on those columns and will output the SQL DDL to define the index for those tables.
  • Normalize_Address — Given a textual street address, returns a composite norm_addy type that has road suffix, prefix and type standardized, street, streetname etc. broken into separate fields. This function will work with just the lookup data packaged with the tiger_geocoder (no need for tiger census data).
  • Pagc_Normalize_Address — Given a textual street address, returns a composite norm_addy type that has road suffix, prefix and type standardized, street, streetname etc. broken into separate fields. This function will work with just the lookup data packaged with the tiger_geocoder (no need for tiger census data). Requires address_standardizer extension.
  • Pprint_Addy — Given a norm_addy composite type object, returns a pretty print representation of it. Usually used in conjunction with normalize_address.
  • Reverse_Geocode — Takes a geometry point in a known spatial ref sys and returns a record containing an array of theoretically possible addresses and an array of cross streets. If include_strnum_range = true, includes the street range in the cross streets.
  • Topology_Load_Tiger — Loads a defined region of tiger data into a PostGIS Topology and transforming the tiger data to spatial reference of the topology and snapping to the precision tolerance of the topology.
  • Set_Geocode_Setting — Sets a setting that affects behavior of geocoder functions.

There are a couple other open source geocoders for PostGIS, that unlike tiger geocoder have the advantage of multi-country geocoding support

  • Nominatim uses OpenStreetMap gazeteer formatted data. It requires osm2pgsql for loading the data, PostgreSQL 8.4+ and PostGIS 1.5+ to function. It is packaged as a webservice interface and seems designed to be called as a webservice. Just like the tiger geocoder, it has both a geocoder and a reverse geocoder component. From the documentation, it is unclear if it has a pure SQL interface like the tiger geocoder, or if a good deal of the logic is implemented in the web interface.

  • GIS Graphy also utilizes PostGIS and like Nominatim works with OpenStreetMap (OSM) data. It comes with a loader to load OSM data and similar to Nominatim is capable of geocoding not just US. Much like Nominatim, it runs as a webservice and relies on Java 1.5, Servlet apps, Solr. GisGraphy is cross-platform and also has a reverse geocoder among some other neat features.