Opened 13 years ago

Last modified 7 years ago

#1599 assigned defect

normalize_address() confused by country

Reported by: mikepease Owned by: robe
Priority: medium Milestone: PostGIS Fund Me
Component: pagc_address_parser Version: master
Keywords: Cc:

Description

If you include a country at the end of an address, normalize_address() incorrectly parses the state and zip.

212 n 3rd ave, Minneapolis, mn 55401, USA (This is the format of the "formatted address" returned by Google's XML webservice)

select * from normalize_address('212 n 3rd ave, Minneapolis, mn 55401, USA')

Result:

stateabbrev

212;"N";"3rd";"Ave";"";"";"mn 55401";"";"";t

Change History (12)

comment:1 by robe, 13 years ago

Milestone: PostGIS 2.0.0PostGIS 2.1.0
Status: newassigned

comment:2 by woodbri, 12 years ago

Cc: woodbri added

comment:3 by woodbri, 12 years ago

Cc: woodbri removed

PAGC tools handle this, but the single line parsers has a trivial bug to fix related to this.

comment:4 by robe, 12 years ago

Component: tiger geocoderpagc_address_parser

comment:5 by robe, 12 years ago

Version: 1.5.Xtrunk

Steve,

This one crashes my backend, (mingw32 installed on windows 7 32 VC++), though all the others I have tried work fine:

select * from pagc_normalize_address('212 n 3rd ave, Minneapolis, mn 55401, USA') 

Gets to start load rules and then hits the bucket. Works fine it I take off the , USA.

I tried the original set to rule out my changes to the tables.

-- crashes --
SELECT (SELECT standardize_address( 'select seq, word::text, stdword::text, token from tiger.pagc_gaz union all select seq, word::text, stdword::text, token from tiger.pagc_lex '
       , 'select seq, word::text, stdword::text, token from tiger.pagc_gaz order by id'
       , 'select * from tiger.pagc_rules order by id'
, 'select 0::int4 as id, ' || quote_literal(address1) || '::text As micro, 
   ' || quote_literal(city || ', ' || state || ' ' || zip) || '::text As macro') As pagc_addr
 FROM  (SELECT * FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') ) As a );
-- also crashes --
SELECT (SELECT standardize_address( 'select seq, word::text, stdword::text, token from gaz union all select seq, word::text, stdword::text, token from lex '
       , 'select seq, word::text, stdword::text, token from gaz order by id'
       , 'select * from rules order by id'
, 'select 0::int4 as id, ' || quote_literal(address1) || '::text As micro, 
   ' || quote_literal(city || ', ' || state || ' ' || zip) || '::text As macro') As pagc_addr
 FROM  (SELECT * FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') ) As a );

The log:

NOTICE:  Start standardize_address
NOTICE:  start load_lex
NOTICE:  Time to read 3764 lexicon records: 10.0 ms.
NOTICE:  start load_lex
NOTICE:  Time to read 834 lexicon records: 0.0 ms.
NOTICE:  start load_rules

comment:6 by robe, 12 years ago

Crash went away after r11247 , but not parsing right first parse seems wrong. Is this the issue you were talking about with country?

-- using hstore here for easier output --
SELECT (each(hstore(a))).* FROM parse_address('212 n 3rd ave, Minneapolis, mn 55401, USA') As a

yields:
   key    |                value
----------+--------------------------------------
 num      | 212
 zip      |
 city     | USA
 state    |
 street   | n 3rd ave, Minneapolis, mn 55401
 country  | US
 street2  |
 zipplus  |
 address1 | 212 n 3rd ave, Minneapolis, mn 55401

This may have to do with how I'm concatenating.

comment:7 by woodbri, 12 years ago

I'll have to look into this more, but I'm out of town the rest of this week, with limited access.

comment:8 by robe, 11 years ago

Milestone: PostGIS 2.1.0PostGIS 2.2.0

comment:9 by robe, 9 years ago

Milestone: PostGIS 2.2.0PostGIS 2.3.0

comment:10 by robe, 9 years ago

Summary: normalize_address() confused by coutrynormalize_address() confused by country

comment:11 by robe, 9 years ago

Milestone: PostGIS 2.3.0PostGIS Future

comment:12 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.