Installing the TIGER Geocoder with RHEL 6.6, PostgreSQL 9.4, PostGIS 2.1.7, and the TIGER 2013 Census Data

Yeah, I know we’re stuck with the TIGER 2013 census data, but the 2014 data doesn’t work easily with PostGIS 2.1.7; PostGIS 2.2, which is still in development, will support the 2014 dataset.  So, for now, we’re using the 2013 data.

I know there are a number of guides on this matter out there, but it still took me enough work to get this process right for my particular distribution (RHEL 6.6 in this case) that I hope it’ll be valuable to others as well, so here’s the procedure I performed to get this up and running!

  1. If necessary, add an outbound firewall rule allowing connections to ftp2.census.gov (148.129.75.35).
  2. Ensure the EPEL repository is installed.
  3. Install the PostgreSQL repository for RHEL 6.6:
    $ wget http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-redhat94-9.4-1.noarch.rpm
    $ sudo yum install pgdg-redhat94-9.4-1.noarch.rpm
  4. Install the relevant packages:
    $ sudo yum install postgresql94 postgresql94-server postgresql94-devel postgresql94-contrib postgis2_94
  5. Initialize your PostgreSQL cluster and start the service
    sudo service postgresql-9.4 initdb

    (Be sure to customize the cluster according to your system and needs, of course.)

    sudo service postgresql-9.4 start
    sudo chkconfig postgresql-9.4 on
  6. Ensure your pg_hba.conf file permits access to the loopback device for the postgres user
    1. A quick way to do this is to edit /var/lib/pgsql/9.4/data/pg_hba.conf file and modify the following line:
      host   all             all             127.0.0.1/32           ident

      To

      host   all             all             127.0.0.1/32           trust

      (This causes your PostgreSQL database service to automatically trust all connection attempts to the system from the loopback device. Anyone who can log into your system can now effectively connect to any database through the loopback device as the postgres (or any other) user, so be sure to undo this when you’re done.  If you wanted to be a bit better about it, you could change the line so that the only user trusted is postgres and the only databases to which connections are allowed are postgres and geocoder, but we can probably let that slide unless you have a very large PostgreSQL cluster and you want to protect the other databases from this operation (in case something should somehow go wrong and cause the postgres user to destroy everything indiscriminately).)

    2. After modifying the pg_hba.conf file, reload the postgresql-9.4 service
      sudo service postgresql-9.4 reload
  7. Create a directory to hold the data downloaded from the US Census department:
    $ sudo mkdir /tmp/tiger

    (Ensure this location has plenty of space – I had over 200 GB available and had no problem. The directory ended up taking 33 GB of space at the end of the process, but this may not be indicative of the maximum size it reached during the process.)

    $ sudo chown postgres:postgres /tmp/tiger
  8. Create the geocoder database:
    $ sudo su postgres
    $ createdb geocoder
  9. Create and add necessary extensions:
    $ psql geocoder
    geocoder=# CREATE EXTENSION postgis;
    geocoder=# CREATE EXTENSION postgis_topology;
    geocoder=# CREATE EXTENSION fuzzystrmatch;
    geocoder=# CREATE EXTENSION postgis_tiger_geocoder;
  10. Test the address normalization functionality:
    geocoder=# SELECT na.address, na.streetname, na.streettypeabbrev, na.zip FROM normalize_address('1 South St, Philadelphia, PA 19130') as na;
     address | streetname | streettypeabbrev | zip
    ---------+------------+------------------+-------
           1 | South      | St               | 19130
  11. Update the first set of variables used by the script which constructs the geocoder:
    geocoder=# SELECT * FROM tiger.loader_variables;
     tiger_year |               website_root                | staging_fold | data_schema | staging_schema
    ------------+-------------------------------------------+--------------+-------------+----------------
     2013       | ftp://ftp2.census.gov/geo/tiger/TIGER2013 | /gisdata     | tiger_data  | tiger_staging
    
    geocoder=# UPDATE tiger.loader_variables SET staging_fold='/var/tiger';
    geocoder=# SELECT * FROM tiger.loader_variables;
     tiger_year |               website_root                | staging_fold | data_schema | staging_schema
    ------------+-------------------------------------------+--------------+-------------+----------------
     2013       | ftp://ftp2.census.gov/geo/tiger/TIGER2013 | /data/tiger  | tiger_data  | tiger_staging
  12. Update the second set of variables used by the script which constructs the geocoder so that the profile matches the RHEL 6.6 environment:
    geocoder=# UPDATE tiger.loader_platform SET declare_sect=
    'TMPDIR="${staging_fold}/temp/"
    UNZIPTOOL=unzip
    WGETTOOL="/usr/bin/wget"
    export PGBIN=/usr/bin
    export PGPORT=5432
    export PGHOST=localhost
    export PGUSER=postgres
    export PGPASSWORD=none
    export PGDATABASE=geocoder
    PSQL=${PGBIN}/psql
    SHP2PGSQL=${PGBIN}/shp2pgsql
    cd ${staging_fold}'
    WHERE os='sh';

    The sh profile should now look like this:

    geocoder=# \x
    geocoder=# SELECT * FROM tiger.loader_platform;
    ...
    -[ RECORD 2 ]----------+-----------------------------------------------------------------------------------
    os                     | sh
    declare_sect           | TMPDIR="${staging_fold}/temp/"
                           | UNZIPTOOL=unzip
                           | WGETTOOL="/usr/bin/wget"
                           | export PGBIN=/usr/bin
                           | export PGPORT=5432
                           | export PGHOST=localhost
                           | export PGUSER=postgres
                           | export PGPASSWORD=none
                           | export PGDATABASE=geocoder
                           | PSQL=${PGBIN}/psql
                           | SHP2PGSQL=${PGBIN}/shp2pgsql
                           | cd ${staging_fold}
    pgbin                  |
    wget                   | wget
    unzip_command          | rm -f ${TMPDIR}/*.*
                           | ${PSQL} -c "DROP SCHEMA IF EXISTS ${staging_schema} CASCADE;"
                           | ${PSQL} -c "CREATE SCHEMA ${staging_schema};"
                           | for z in *.zip; do $UNZIPTOOL -o -d $TMPDIR $z; done
                           | for z in */*.zip; do $UNZIPTOOL -o -d $TMPDIR $z; done
                           | cd $TMPDIR;
                           |
    psql                   | ${PSQL}
    path_sep               | /
    loader                 | ${SHP2PGSQL}
    environ_set_command    | export
    county_process_command | for z in *${table_name}.dbf; do
                           | ${loader} -D -s 4269 -g the_geom -W "latin1" $z ${staging_schema}.${state_abbrev}_${table_name} | ${psql}
                           | ${PSQL} -c "SELECT loader_load_staged_data(lower('${state_abbrev}_${table_name}'), lower('${state_abbrev}_${lookup_name}'));"
                           | done
    geocoder=# \q
  13. Generate and execute the script which will be used to load and configure the base data from the US Census department:
    $ psql -At -d geocoder -c "SELECT loader_generate_nation_script('sh');" > tiger_nation.sh
    $ sh tiger_nation.sh

    (This runs for about 3-4 minutes.)

  14. Generate and execute the script which will be used to load and configure state data from the US Census department:
    $ psql -At -d geocoder -c "SELECT tiger.loader_generate_script(ARRAY['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA', 'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME', 'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM', 'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY'], 'sh');" > tiger_states.sh
    $ nohup sh tiger_states.sh &

    (This ran for ~26.75 hours on my fairly beastly rig, limited largely by the 500 KB/s download rate from the Census FTP server. Definitely use nohup.)

  15. Install missing indices
    $ psql geocoder
    geocoder=# SELECT install_missing_indexes();
  16. Test the geocoder!
    geocoder=# SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat, (addy).address As stno, (addy).streetname As street, (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st, (addy).zip FROM geocode('1600 Pennsylvania Avenue Northwest, Washington, DC 20500') As g;
     rating |       lon         |       lat        | stno |    street    | styp |    city    | st | zip
    --------+-------------------+------------------+------+--------------+------+------------+----+-------
          2 | -77.0351147858455 | 38.8986709360362 | 1600 | Pennsylvania |  Ave | Washington | DC | 20502
    
    geocoder=# \x
    Expanded display is on.
    
    geocoder=# SELECT g.*, ST_X(g.geomout) As lon,ST_Y(g.geomout) As lat,(addy).*, bg.tabblock_id AS FIPS FROM geocode('1600 Pennsylvania Avenue Northwest, Washington, DC 20500') As g
    JOIN tabblock As bg
    ON (g.geomout && bg.the_geom)
    WHERE ST_Within (g.geomout, bg.the_geom);
    -[ RECORD 1 ]----+---------------------------------------------------
    addy             | (1600,,Pennsylvania,Ave,NW,,Washington,DC,20502,t)
    geomout          | 0101000020AD100000FF3316523F4253C0101234A607734340
    rating           | 2
    lon              | -77.0351147858455
    lat              | 38.8986709360362
    address          | 1600
    predirabbrev     |
    streetname       | Pennsylvania
    streettypeabbrev | Ave
    postdirabbrev    | NW
    internal         |
    location         | Washington
    stateabbrev      | DC
    zip              | 20502 
    parsed           | t
    fips             | 110010062021031
  17. Edit /var/lib/pgsql/9.4/data/pg_hba.conf file and return it to its original configuration (reverse step 6).
  18. Remove the outbound firewall rule for connecting to ftp2.census.gov (allow traffic to 148.129.75.35).
  19. Configure geocoder access privileges for your users
    $ psql geocoder
    geocoder=# GRANT CONNECT ON DATABASE geocoder TO user_role;
    geocoder=# GRANT USAGE ON SCHEMA tiger TO user_role;
    geocoder=# GRANT SELECT ON ALL TABLES IN SCHEMA tiger TO user_role;
    geocoder=# GRANT SELECT ON ALL SEQUENCES IN SCHEMA tiger TO user_role;
    geocoder=# GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA tiger TO user_role;
  20. Enjoy success!
Advertisements
This entry was posted in Information Technology and tagged , . Bookmark the permalink.

2 Responses to Installing the TIGER Geocoder with RHEL 6.6, PostgreSQL 9.4, PostGIS 2.1.7, and the TIGER 2013 Census Data

  1. Thanks for writing up such a clear set of instructions!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s