Opened 13 years ago
Closed 12 years ago
#1614 closed defect (wontfix)
More normalizer tricks
Reported by: | mikepease | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.1.0 |
Component: | tiger geocoder | Version: | 1.5.X |
Keywords: | Cc: | woodbri |
Description
--Specifying St. Paul but not MN misinterprets state select * from normalize_address('933 Vandalia Ave, St Paul'); --Output PALAU (PW)?? --933 "" "Vandalia" "" "" "" "St" "PW" "" t
select * from normalize_address('933 Vandalia, St. Paul, 55304') ; --Still goes to Palau
select * from normalize_address('933 Vandalia, St. Paul, MN') --Works properly
-Why is the syntax so sensitive here? --none of these work right select * from normalize_address('901 Mainstreet, Fl 2, Hopkins MN 55343') select * from normalize_address('901 Mainstreet Fl 2, Hopkins, MN 55343') select * from normalize_address('901 Mainstreet Fl 2 Hopkins, MN 55343') select * from normalize_address('901 Mainstreet Fl 2 Hopkins, MN 55343') select * from normalize_address('901 Mainstreet, Fl 2 Hopkins, MN 55343') select * from normalize_address('901 Mainstreet St, Fl 2 Hopkins, MN 55343') --this one does select * from normalize_address('901 Mainstreet St Fl 2, Hopkins, MN 55343')
Change History (8)
comment:1 by , 13 years ago
Milestone: | PostGIS 2.0.0 → PostGIS 2.1.0 |
---|
comment:2 by , 13 years ago
--County Road syntax is sensitive In a address database we have, they used a different syntax for listing county roads. Example: 8435 COUNTY 20 RD SE, ROCHESTER, MN 55904
This normalizes differently than: 8435 COUNTY RD 20 SE, ROCHESTER, MN 55904
But this second syntax stumps the normalizer. If you write it this way, then it works: 8435 COUNTY ROAD 20 SE, ROCHESTER, MN 55904
select * from normalize_address('8435 COUNTY 20 RD SE, ROCHESTER, MN 55904')
select * from normalize_address('8435 COUNTY RD 20 SE, ROCHESTER, MN 55904')
select * from normalize_address('8435 COUNTY ROAD 20 SE, ROCHESTER, MN 55904')
I can see why the first syntax may produce a reasonable, if not the desired, result. But the second syntax shouldn't get stumped.
Looks like the look up table for for road type needs more versions of spellings to include: COUNTY RD as well as COUNTY ROAD
Perhaps this is true for other street types too?
comment:3 by , 13 years ago
Google uses "U.S." as its formal syntax for a US Hwy. If I add this to the street_type_lookup, I get some more matches in my address database.
select * from normalize_address('3208 U.S. 52, Rochester, MN 55901')
comment:5 by , 12 years ago
Cc: | added |
---|
comment:6 by , 12 years ago
When I loaded and standardized the all of Tiger for the whole US using the PAGC standardizer. I looked at the records that failed to standardize so I could add entries to the lexicon and gazeteer and parser rules. I found a lot of garbage in these records. Things like you mention above COUNTY 20 RD vs COUNTY RD 20, and things like the street type in BOTH the name and the type fields. I think there were about 9000 records out 50 Million, so I have not waded through them yet as I had other higher priority items. I also think some simple regex checking and cleaning of these in the loading process is the best way to deal with them. Another words spend the time once to deal with these, so the search code is cleaner, simpler and faster.
comment:7 by , 12 years ago
Steve,
That's a great idea. I haven't thought much to doing that, mostly because I haven't thought what cleaning rules to institute. In theory it should be easy to inject these preprocessing steps since the loader loads the shapefile into a staging table before pushing to the final tables. So all this cleaning can be done in staging.
comment:8 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Going to focus my effort on integrating PAGC
I'm going to push this but might get to it before then. Just don't what people yelling at me with the 2.0.0 space cluttered.