Opened 5 years ago

Closed 4 years ago

#4655 closed defect (wontfix)

TIGER 'addrfeat' table definition vs. current 2019 shapefiles

Reported by: jlrobins Owned by: robe
Priority: medium Milestone: PostGIS 3.0.3
Component: tiger geocoder Version: 3.0.x
Keywords: Cc:

Description

I'm bootstrapping a PostGIS 3.0.1, PostgreSQL 12.2, TIGER 2019 database from scratch against all possible layers for the southeast US. I've met an ounce of resistance when trying to load the 'addrfeat' layer for FL, which happened to be my first state. The following section of the results of

    SELECT loader_generate_script(ARRAY['FL','NC', 'SC', 'GA'],'sh');`

...

${PSQL} -c "CREATE TABLE tiger_data.FL_addrfeat(CONSTRAINT pk_FL_addrfeat PRIMARY KEY (gid)) INHERITS(tiger.addrfeat);ALTER TABLE tiger_data.FL_addrfeat ALTER COLUMN statefp SET DEFAULT '12';"
for z in *addrfeat*.dbf; do
	${SHP2PGSQL} -D  -S -D -s 4269 -g the_geom -W "latin1" $z tiger_staging.FL_addrfeat | ${PSQL}
	${PSQL} -c "SELECT loader_load_staged_data(lower('FL_addrfeat'), lower('FL_addrfeat'));"
done 

ends up croaking within the loader_load_staged_data() call due to a mismatch in columns from the shapefile vs. columns in base table addrfeat:

ALTER TABLE
Shapefile type: Arc
Postgis type: LINESTRING[2]
SET
SET
BEGIN
CREATE TABLE
ALTER TABLE
                          addgeometrycolumn                           
----------------------------------------------------------------------
 tiger_staging.fl_addrfeat.the_geom SRID:4269 TYPE:LINESTRING DIMS:2 
(1 row)

COPY 32505
COMMIT
ANALYZE
NOTICE:  INSERT INTO tiger_data.fl_addrfeat(aridl,aridr,edge_mtfcc,fullname,lfromhn,lfromtyp,linearid,ltohn,ltotyp,offsetl,offsetr,parityl,parityr,plus4l,plus4r,rfromhn,rfromtyp,rtohn,rtotyp,the_geom,tlid,zipl,zipr) SELECT aridl,aridr,edge_mtfcc,fullname,lfromhn,lfromtyp,linearid,ltohn,ltotyp,offsetl,offsetr,parityl,parityr,plus4l,plus4r,rfromhn,rfromtyp,road_mtfcc,rtohn,rtotyp,tfidl,tfidr,the_geom,tlid,zipl,zipr FROM tiger_staging.fl_addrfeat;
ERROR:  INSERT has more expressions than target columns
LINE 1: ...mtyp,road_mtfcc,rtohn,rtotyp,tfidl,tfidr,the_geom,tlid,zipl,...
                                                             ^
QUERY:  INSERT INTO tiger_data.fl_addrfeat(aridl,aridr,edge_mtfcc,fullname,lfromhn,lfromtyp,linearid,ltohn,ltotyp,offsetl,offsetr,parityl,parityr,plus4l,plus4r,rfromhn,rfromtyp,rtohn,rtotyp,the_geom,tlid,zipl,zipr) SELECT aridl,aridr,edge_mtfcc,fullname,lfromhn,lfromtyp,linearid,ltohn,ltotyp,offsetl,offsetr,parityl,parityr,plus4l,plus4r,rfromhn,rfromtyp,road_mtfcc,rtohn,rtotyp,tfidl,tfidr,the_geom,tlid,zipl,zipr FROM tiger_staging.fl_addrfeat;
CONTEXT:  PL/pgSQL function loader_load_staged_data(text,text,text[]) line 26 at EXECUTE
SQL function "loader_load_staged_data" statement 1

shp2pgsql from the shapefile within https://www2.census.gov/geo/tiger/TIGER2019/ADDRFEAT/tl_2019_13001_addrfeat.zip has 3 columns not present in base table addrfeat:

    CREATE TABLE "tl_2019_12001_addrfeat" (
    ...
    "tfidl" int8,
    "tfidr" int8,
    ...
    "road_mtfcc" varchar(5),
    ...

When I adjust tiger_geocoder.sql.in's definition of addrfeat with how I see tfidl, tfidr, and similar concept edge_mtfcc modeled:

    CREATE TABLE addrfeat
    (
      ...
      tfidl numeric(10),
      tfidr numeric(10),
      ...
      road_mtfcc character varying(5),
      ...
    );

Then initial population proceeds past this point.

I've not yet explored what an 'mftcc' is, but there are at least some differences in road_ vs. edge_ values in FL:

tiger=# select (edge_mtfcc = road_mtfcc) as same_mtfcc_value, count(*) from tiger_data.fl_addrfeat group by 1 order by 2 desc;
 same_mtfcc_value |  count  
------------------+---------
 t                | 1268547
 f                |    2615
(2 rows)


tiger=# select gid, edge_mtfcc, road_mtfcc from tiger_data.fl_addrfeat where edge_mtfcc != road_mtfcc order by 1, 2, 3 limit 5;
  gid  | edge_mtfcc | road_mtfcc 
-------+------------+------------
  1180 | P0001      | S1400
 16533 | P0001      | S1200
 16534 | P0001      | S1200
 36457 | P0001      | S1200
 36500 | P0001      | S1400
(5 rows)

Change History (6)

comment:1 by jlrobins, 5 years ago

Sorry, that last informational query should have projected the tlid of some interesting rows, not my own gid serial values:

tiger=# select tlid, fullname, edge_mtfcc, road_mtfcc
    from tiger_data.fl_addrfeat where edge_mtfcc != road_mtfcc
    order by 1, 2, 3, 4 limit 5;


  tlid   | fullname  | edge_mtfcc | road_mtfcc 
---------+-----------+------------+------------
 5160653 | Waldo St  | P0001      | S1400
 6848822 | 1st Ave   | P0001      | S1200
 6848822 | 1st Ave   | P0001      | S1200
 6980365 | Burns Ave | P0001      | S1400
 7062643 | Hwy 389   | P0001      | S1200
(5 rows)

comment:2 by robe, 5 years ago

ijlrobins thanks for the report. Just out of curiosity what do you use addrfeat for?

The geocoder doesn't use it at all and I think I have it turned off by default. I had originally thought about using it but after some experiments decided joining with addr and features as far as geocoding and disk space usage was the best way to go. At a glance looked redundant.

comment:3 by robe, 5 years ago

Component: postgistiger geocoder
Owner: changed from pramsey to robe

in reply to:  3 comment:4 by jlrobins, 5 years ago

Replying to robe:

Absolutely nothing at this point. Having manually imported ~2012-era census boundaries manually in my prior shop, and realizing just what a chore it was, I was excited to start out with someone else's repeatable efforts and then pick through the results to figure out what layers I'd like to keep. So I started from the perspective of 'import every possible layer; sort 'em out later.'

So at this point take it as advisement that some written code isn't working as hoped, regardless of its possible utility.

Thanks!

comment:5 by Algunenano, 4 years ago

Milestone: PostGIS 3.0.3

comment:6 by robe, 4 years ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.