Opened 13 years ago
Closed 13 years ago
#1777 closed enhancement (wontfix)
revise geocoder structure and functions to utilize 2011 enhancements
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.1.0 |
Component: | tiger geocoder | Version: | master |
Keywords: | Cc: |
Description
The tiger 2011 data -- ftp://ftp2.census.gov/geo/tiger/TIGER2011/
Has a new structure addrfeat which looks like would be useful to simplify and seepd up soe of our queries. That said -- if we change the queries to use this new table, the new code can not be used with 2010 data. Though in theory the current structure would work fine with 2011 data. Just not optimal.
Change History (4)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
okay still haven't figured out the discrepancy in numbers except possibly cleanup.
I did a count like this:
select count(DISTINCT tfid), count(*), count(distinct countyfp) from tiger_data.ma_faces;
And I have the same number of counties in both 2010 and 2011 (so doesn't seem I missed a county by accident). For both, the tfids are unique (don't have dupes), but yet the 2011 has fewer faces. Don't see anything in the docs yet alluding to why this is .
comment:3 by , 13 years ago
hmm well only clue of what could have happened is this:
In preparation for the 2010 Census, Census Bureau employees walked virtually every street in the United States and Puerto Rico with the primary purpose of verifying and updating Census address lists. A second priority was to provide updates to the Census Bureau’s road network. For the first time census workers used handheld computers that captured GPS information and used this technology to improve both the address lists and the census road network.
Maybe MA tiger data got a lot of junk streets removed. Perhaps California those new developments are in 2011 set.
comment:4 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I've decided not to bother with this. I have the table loaded, but thinking of turning the load off by default. The reason I'm making this decidions is
1) addrfeat (at least for MA seems to be just a join between featnames,edges,addr so it's denormalized just including edges that are roadways) so it will require more disk space to hold dupe edges.
2) On top of that my left and right are in the same record which means I have to do messy case statements to generate separate records where i need to. 3)It's missing the name field which is pretty important the way the current geocode is setup (just has fullname)
3) PostgreSQL is pretty efficient with joins so I don't see much loss of time here. If I were using MySQL which can only handle one index at at time, that would be a different story. So what I need is to load more data into memory -- this DOES NOT HELP and makes things worse since redundant edge geometries will be loaded into memory thus occupying valuable real estate space.
If anyone want's do benchmark to prove me wrong, I'll reconsider. But I've dismissed mostly out of thought process that its not a good fit for how PostgreSQL works and better suited for less sophisticated databases.
See details of ticket #1643. I have load logic for addrfeat but not doing anything with that table yet.
One thing I am noticing is that my faces and edges loaded from 2011 are fewer than what I got with the 2010 data for Massachusetts. I'm guessing that maybe the new 2011 doesn't have redundant tlids and faces that represent the boundaries of the counties. I haven't confirmed its not an issue with my load.
For example my 2010 ma_edges had 900450 and my 2011 ma_edges load has 880344.
2010 ma_faces had 294673 whiel 2011 ma_faces has 283801