Opened 8 years ago
Closed 7 years ago
#3698 closed defect (fixed)
CENSUS Download path for TABBLOCK
Reported by: | EvanCarroll | Owned by: | robe |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.3.4 |
Component: | tiger geocoder | Version: | 2.3.x |
Keywords: | Cc: |
Description
Name | postgis_tiger_geocoder Version | 2.3.1 Schema | tiger Description | PostGIS tiger geocoder and reverse geocoder
When I generate a download scripts with
SELECT Loader_Generate_Script(ARRAYTX, 'sh');
I'm getting
unzip: cannot find or open tl_*_48*_edges*.zip, tl_*_48*_edges*.zip.zip or tl_*_48*_edges*.zip.ZIP. No zipfiles found.
This seems to be because earlier in the download, I get
--2017-01-25 15:16:05-- http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock.zip Resolving www2.census.gov (www2.census.gov)... 23.11.58.87 Connecting to www2.census.gov (www2.census.gov)|23.11.58.87|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2017-01-25 15:16:10 ERROR 404: Not Found.
tx.sh: 56: cd: can't cd to /gisdata/www2.census.gov/geo/tiger/TIGER2016/TABBLOCK
Checking census it's not
http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock.zip
instead, it's
http://www2.census.gov/geo/tiger/TIGER2016/TABBLOCK/tl_2016_48_tabblock10.zip
I can probably fix this kind of stuff. I really wish the build system for this wasn't tied together with PostGIS. You're making it very hard to send a patch this way. I'm pretty capable and handy, but the learning curve is very high in getting a PostGIS development environment working.
Change History (18)
comment:1 by , 8 years ago
Milestone: | PostGIS 2.4.0 → PostGIS 2.3.2 |
---|
comment:2 by , 8 years ago
Summary: | Problem with Geocoder. → CENSUS Download directory for TABBLOCK |
---|
comment:3 by , 8 years ago
Summary: | CENSUS Download directory for TABBLOCK → CENSUS Download path for TABBLOCK |
---|
comment:4 by , 8 years ago
comment:5 by , 8 years ago
Milestone: | PostGIS 2.3.2 → PostGIS 2.4.0 |
---|
This is a bit more involved than I thought since it doesn't follow the convention folder_name/..folder_name.zip. So I'll deal with this when they release 2017 data and just disable the download in the loader tables. They probably put a 10 in it because it hasn't been changed since 2010.
The tabblock tables are not used for anything so probably a useless bit of data for people only interested in geocoding.
The only reason I included it way back is the machinery to download tabblock was already there for folks who needed to do census block research. It should always have been disabled by default and require users to explicitly go into the tiger.lookup_tables to enable it.
Regarding whether tiger geocoder should be part of PostGIS or not. Historically it always has been, and before extension building, it was pretty detached in the extras/tiger_geocoder folder You can still build and install it that way by just making changes to the files in extras/tiger_geocoder and using the shell scripts.
The only part the tiger geocoder that relies on the PostGIS build is wrapping it as an extension. That itself could probably be detached so that if you really wanted to build just the geocoder extension file, you could cd into the extensions folder and just build that. I could at the very least provide instructions on how to hack the scripts in extensions folder to do that.
One great benefit of having it as part of PostGIS is that cloud offerings like Amazon RDS and Heroku build it and so do all the various package maintainers, Ubuntu, Yum etc. If it wasn't part of PostGIS I think packagers would be less inclined to go out of their way to build it and much fewer people would use it.
comment:6 by , 8 years ago
oops put the wrong number on commit again. r15293 disables the load for PostGIS 2.3.2
comment:8 by , 8 years ago
Very cool robe, I think of a guide on how to contribute to the plugin would be very much appreciated. I had a difficulty building it the last time. I still think you'd get more users and more active development if this wasn't part of core. And, we could distribute it separately through pgxn or something rather than having to have users pry it out of a zip file to upgrade the Census data without upgrading PostGIS.
http://dba.stackexchange.com/q/154209/2639 http://dba.stackexchange.com/q/154910/2639 http://dba.stackexchange.com/a/154909/2639
In other news, Census.gov is now blacklisting if you download more than 30 GB. I'm not sure if we should do anything about it. I'm currently in touch with the Census on getting it resolved for myself. If I can get the contacts I'll see if they're down for adding a perma-whitelist for PostGIS loader based on useragent or query-string. Akamai has now blocked two of my IPS, so I'm guessing this is something new with 2017.
comment:9 by , 8 years ago
Also, for those finding this bug on Google you can turn off loading tabblock without upgrading the plugin using this..
UPDATE loader_lookuptables SET load=false WHERE lookup_name = 'tabblock';
comment:10 by , 8 years ago
I think they started to blacklist in 2015 though they seem to have gotten more aggressive.
comment:11 by , 7 years ago
Milestone: | PostGIS 2.4.0 → PostGIS 2.3.4 |
---|
comment:12 by , 7 years ago
So, any recommendations for someone like me who has a use case that does require census block data?
In the generated load-states.sh
script I tried appending the filename references like so:
sed -i "s/_tabblock\./_tabblock10./g" load-states.sh
That seemed to get it to work. However, when I try to load a state (e.g., 'CA'
), there are still no rows in the tabblocks
table:
postgres=# select * from tabblock; gid | statefp | countyfp | tractce | blockce | tabblock_id | name | mtfcc | ur | uace | funcstat | aland | awater | intptlat | intptlon | the_geom -----+---------+----------+---------+---------+-------------+------+-------+----+------+----------+-------+--------+----------+----------+---------- (0 rows)
I noticed that for most other tables, loader_load_staged_data
is used to transfer the data from the tiger_staging
to the tiger_data
database, but that doesn't seem to be happening for the tabblock
table in the script. Notice the difference between the CA_tract
and CA_tabblock
handling from the script excerpt below:
${PSQL} -c "CREATE TABLE tiger_data.CA_tract(CONSTRAINT pk_CA_tract PRIMARY KEY (tract_id) ) INHERITS(tiger.tract); " ${SHP2PGSQL} -D -c -s 4269 -g the_geom -W "latin1" tl_2013_06_tract.dbf tiger_staging.ca_tract | ${PSQL} ${PSQL} -c "ALTER TABLE tiger_staging.CA_tract RENAME geoid TO tract_id; SELECT loader_load_staged_data(lower('CA_tract'), lower('CA_tract')); " ${PSQL} -c "CREATE INDEX tiger_data_CA_tract_the_geom_gist ON tiger_data.CA_tract USING gist(the_geom);" ${PSQL} -c "VACUUM ANALYZE tiger_data.CA_tract;" ${PSQL} -c "ALTER TABLE tiger_data.CA_tract ADD CONSTRAINT chk_statefp CHECK (statefp = '06');" cd /gisdata #wget http://www2.census.gov/geo/tiger/TIGER2013/TABBLOCK/tl_2013_06_tabblock.zip --mirror --reject=html cd /gisdata/www2.census.gov/geo/tiger/TIGER2013/TABBLOCK rm -f ${TMPDIR}/*.* ${PSQL} -c "DROP SCHEMA IF EXISTS tiger_staging CASCADE;" ${PSQL} -c "CREATE SCHEMA tiger_staging;" for z in tl_2013_06*_tabblock.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done cd $TMPDIR; ${PSQL} -c "CREATE TABLE tiger_data.CA_tabblock(CONSTRAINT pk_CA_tabblock PRIMARY KEY (tabblock_id)) INHERITS(tiger.tabblock);" ${SHP2PGSQL} -D -c -s 4269 -g the_geom -W "latin1" tl_2013_06_tabblock.dbf tiger_staging.ca_tabblock | ${PSQL} ${PSQL} -c "ALTER TABLE tiger_staging.CA_tabblock RENAME geoid TO tabblock_id;" ${PSQL} -c "ALTER TABLE tiger_data.CA_tabblock ADD CONSTRAINT chk_statefp CHECK (statefp = '06');" ${PSQL} -c "CREATE INDEX tiger_data_CA_tabblock_the_geom_gist ON tiger_data.CA_tabblock USING gist(the_geom);" ${PSQL} -c "vacuum analyze tiger_data.CA_tabblock;" cd /gisdata #wget http://www2.census.gov/geo/tiger/TIGER2013/BG/tl_2013_06_bg.zip --mirror --reject=html cd /gisdata/www2.census.gov/geo/tiger/TIGER2013/BG rm -f ${TMPDIR}/*.* ${PSQL} -c "DROP SCHEMA IF EXISTS tiger_staging CASCADE;" ${PSQL} -c "CREATE SCHEMA tiger_staging;" for z in tl_2013_06*_bg.zip ; do $UNZIPTOOL -o -d $TMPDIR $z; done cd $TMPDIR;
Any ideas on how to make sure my tabblock
table gets populated using the TIGER2016 data?
comment:13 by , 7 years ago
Jeffrey,
I haven't looked at this issue in a while. I've put this on my todo to fix as part of the FOSS4G OSGeo Code Sprint this weekend. https://wiki.osgeo.org/wiki/FOSS4G_2017_Code_Sprint#PostGIS
comment:14 by , 7 years ago
I spent a bit more time looking into this and got it to work with the following changes in the loader_lookuptables
:
UPDATE loader_lookuptables SET load = true , post_load_process = '${psql} -c "ALTER TABLE ${staging_schema}.${state_abbrev}_${lookup_name} RENAME geoid10 TO tabblock_id; SELECT loader_load_staged_data(lower(''${state_abbrev}_${table_name}''), lower(''${state_abbrev}_${lookup_name}'')); " ${psql} -c "ALTER TABLE ${data_schema}.${state_abbrev}_${lookup_name} ADD CONSTRAINT chk_statefp CHECK (statefp = ''${state_fips}'');" ${psql} -c "CREATE INDEX ${data_schema}_${state_abbrev}_${lookup_name}_the_geom_gist ON ${data_schema}.${state_abbrev}_${lookup_name} USING gist(the_geom);" ${psql} -c "vacuum analyze ${data_schema}.${state_abbrev}_${lookup_name};"' , columns_exclude = '{gid, uatyp10, uatype, suffix1ce}' WHERE lookup_name = 'tabblock' ;
I had to change the columns_exclude
as well as add the loader_load_staged_data
statement in the post_load_process
.
Also, it was still necessary to work with the 10
suffix on the TABBLOCK filenames:
sed -i "s/_tabblock\./_tabblock10./g" load-states.sh
comment:15 by , 7 years ago
Jeffrey,
Thanks for the experiment. I've incorporated your changes into code, but to fix the tabblock10/tabblock10 issue I had to modify the loader scripts a bit.
I also realized the census script was a bit broken so fixing that as well. And then also discovered census is using SSLV3 as noted in #3816.
I'm doing some testing of states before I commit my changes.
What I will do is keep tabblock, bg off since there is nothing in the code that uses those tables. tract I'm tempted to turn off as well by default except I have get_tract function that uses it.
I'll update the docs with an update statement for people that want those tables loaded and also show how to use the loader_generate_census_script which I have set to download bg, tract, tabblock regardless of if they are turned off in the lookup_tables.
comment:16 by , 7 years ago
Thanks -- that all sounds good.
For what it's worth, I think a refactoring of the generate
scripts is in order. It would be a lot easier for end users like me if all of the downloads happened up front, and the loads happened separately. Also, on the Unix-ish versions, I think a
set -e
at the top of the script is essential, particularly given the fact that Akamai is blacklisting people. You can end up waiting hours only to find out that things haven't really loaded, and you pretty much need to start over in a lot of cases.
Related: has anyone from this team tried to get in touch with the Census folks? I think blacklisting people might be a violation of FOIA (https://www.foia.gov/).
comment:17 by , 7 years ago
I tried once and gave up cause I couldn't reach up to the chain of anyone technical and just got the "Works fine for me talk to your service provider" answer. I'll try again maybe after the 2017 data comes out which should be soon. Feel free to bug them yourself.
As far as refactoring the script, it's a little too late to do for 2.4.1, but will definitely try doing it for 2.4.1 or 2.5.0.
It seems that this was changed a long time ago. In 2013 this worked
However, in 2014, and on the convention changed..