Opened 10 years ago

Closed 9 years ago

#2980 closed defect (fixed)

directional as a street name should not be abbreviated

Reported by: robe Owned by: robe
Priority: high Milestone: PostGIS 2.2.0
Component: pagc_address_parser Version: master
Keywords: history Cc:

Description

Currently tiger geocoder expects pref directions to be abbreviated, but street names should not be.

pagc_normalize_address unfortunately does this:

SELECT predirabbrev, streetname, streettypeabbrev
FROM pagc_normalize_address('41 north st, boston, ma 02109');
-- not okay would prefer streetname = 'NORTH'
 predirabbrev | streetname | streettypeabbrev
--------------+------------+------------------
              | N          | ST

Though this works as hoped

SELECT predirabbrev, streetname, streettypeabbrev
FROM pagc_normalize_address('41 north washington st, boston, ma 02109');

 predirabbrev | streetname | streettypeabbrev
--------------+------------+------------------
 N            | WASHINGTON | ST

Change History (6)

comment:1 by robe, 10 years ago

Owner: changed from woobri to robe

comment:2 by robe, 10 years ago

Doing this seems to do the trick though need to test some more before I commit.

-- define NORTH and SOUTH as regular words 
INSERT INTO pagc_lex(seq, word, stdword, token, is_custom)
SELECT seq, word, stdword, token, false
FROM (VALUES ( 2, 'NORTH', 'NORTH', 1),
 (2, 'SOUTH', 'SOUTH', 1) ) As f(seq,word,stdword,token);

-- now that we have north and south are regular words if they are seen with 
-- an additional word we want it treated as directional token (unfortunately the 
-- word token ranks higher causing this one to always be ignored -- so up this one

UPDATE pagc_rules SET RULE = '0 22 1 2 -1 1 2 5 6 -1 1 17' WHERE rule = '0 22 1 2 -1 1 2 5 6 -1 1 16';

-- may want to consider lowering the 0 1 2 -1 1 5 6 -1 1    rule -

So after the above change I have:

{{{SELECT predirabbrev, streetname, streettypeabbrev FROM pagc_normalize_address('41 north st, boston, ma 02109');

predirabbrev | streetname | streettypeabbrev


| NORTH | ST

predirabbrev | streetname | streettypeabbrev


N | WASHINGTON | ST

}}}

comment:3 by robe, 10 years ago

Let me try that again:

-- define NORTH and SOUTH as regular words 
INSERT INTO pagc_lex(seq, word, stdword, token, is_custom)
SELECT seq, word, stdword, token, false
FROM (VALUES ( 2, 'NORTH', 'NORTH', 1),
 (2, 'SOUTH', 'SOUTH', 1) ) As f(seq,word,stdword,token);

-- now that we have north and south are regular words if they are seen with 
-- an additional word we want it treated as directional token (unfortunately the 
-- word token ranks higher causing this one to always be ignored -- so up this one

UPDATE pagc_rules SET RULE = '0 22 1 2 -1 1 2 5 6 -1 1 17' WHERE rule = '0 22 1 2 -1 1 2 5 6 -1 1 16';

-- may want to consider lowering the 0 1 2 -1 1 5 6 -1 1
SELECT predirabbrev, streetname, streettypeabbrev FROM pagc_normalize_address('41 north st, boston, ma 02109'); 

predirabbrev | streetname | streettypeabbrev
-------------+------------+------------------
             | NORTH      | ST
SELECT predirabbrev, streetname, streettypeabbrev
FROM pagc_normalize_address('41 north washington st, boston, ma 02109');

 predirabbrev | streetname | streettypeabbrev
--------------+------------+------------------
 N            | WASHINGTON | ST

comment:4 by robe, 10 years ago

Priority: mediumhigh

comment:5 by robe, 9 years ago

This still needs work now makes North Street -- N street which is wrong:

SELECT predirabbrev, streetname, streettypeabbrev FROM pagc_normalize_address('41 north st, boston, ma 02109'); 

 predirabbrev | streetname | streettypeabbrev
--------------+------------+------------------
              | N          | ST

comment:6 by robe, 9 years ago

Keywords: history added
Resolution: fixed
Status: newclosed

fixed at r13963

Note: See TracTickets for help on using tickets.