Opened 13 years ago
Closed 13 years ago
#1077 closed task (fixed)
Regression tests for tiger geocoder
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.0.0 |
Component: | tiger geocoder | Version: | master |
Keywords: | Cc: |
Description (last modified by )
Now that we are making so many changes to tiger geocoder I fair breaking things. Thus need to start writing up regression tests.
Normalize_address is easy since most of what is needed for normalize can run with just the pre-packaged lookup tables loaded.
Geocode is harder since it needs real data so we might have to setup samples.
Speed is important since any minor change we make can significantly impact speed, and that is even trickier to regress since its so dependent on database setup.
Anyrate we'll just start setting up tests and hand test them as we make changes until we've got a better idea.
Change History (8)
comment:1 by , 13 years ago
Description: | modified (diff) |
---|
comment:2 by , 13 years ago
comment:3 by , 13 years ago
Version: | 1.5.X → trunk |
---|
Is there some known set of addresses that supposed to be geocode-able?
comment:4 by , 13 years ago
I am looking at a large dataset that is applicable to test the geocoder.. Among the fields are :
"Address" "City" "State" "Zip" "UnitNumber" "HouseNumber" "StreetPrefix" "StreetName" "StreetType" "StreetSuffix"
(the division there is my own..) I am extracting a sample now ORDER BY random() What combinations are fair game for the geocoder?
comment:5 by , 13 years ago
There are a couple of issues. I think anything is fairgame. If you look at Mike Pease tickets -- that gives you a general sense of the pitfall areas:
We are working on fixing these particularly Highway and misspellings. But while doing so we need to make sure that
1) We don't slow down the geocoding of things that used to be fast by adding in more checks
2) We don't break things that used to work like addresses that returned right answers now returning wrong answers.
So as long as you have a base line of some sort -- which you would from above that would be good enough.
I'm not even so concerned about random access because genrally I think for speed people will sort the data in some sort of meaning full order like zip, street etc to get faster speeds.
So first stab is just to make sure the goecoding is still right Second stab hasn't lost speed (which is trickier to test because of differences in caching behavior depening on sort, speed of server, and other server stuff happening)
comment:6 by , 13 years ago
Okay we have a good chunk of failures, fixes and regress example takes due in great part to Mike Pease and others. These have been added already to folder regress and tests in there normalize_regress.sql, geocode_regress.sql as well as the expected outputs after fix in normalize_regress, geocode_regress.
Brian Hamlin has provided more normalize failures based on USPS cass test suite. These we should add and tackle first the ones that prevent accurate geocoding. Below is list exerted from postgis-devel. Some of these failures are not surprising and some are even relatively harmless as far as geocoding is concerned but their behavior should be fixed and/or noted in the regress tests.
--------------------------------------------------------------- 400 AVENUE I, WEST POINT, GA 31833 400 AVENUE I W, POINT, GA 31833 --------------- 19596 COUNTY ROAD 480, COLCORD, OK 74338 19596 480 Co Rd, COLCORD, OK 74338 29779 STATE HIGHWAY C BOX 974, POTOSI, MO 63664 29779 C State Hwy, POTOSI, MO 63664 10559 NE STATE HIGHWAY 90, PINEVILLE, MO 64856 10559 90 State Hwy NE, PINEVILLE, MO 64856 18208 N COUNTY ROAD 241, ALACHUA, FL 32615 18208 241 Co Rd N, ALACHUA, FL 32615 4345 ROUTE 353, SALAMANCA, NY 14779 4345 353 Rte, SALAMANCA, NY 14779 19799 STATE ROUTE O, COSBY, MO 64436 19799 O State Rte, COSBY, MO 64436 ------------------------------------------------------ 1292 NE AVENUE B, SWEETWATER, TX 79556 1292 NE Ave, SWEETWATER, TX 79556 399 WEST AVE F, JEROME, ID 83338 399 WEST Ave, JEROME, ID 83338 ------------------------------------------------------ 19126-20 9TH AVE, PARKER, AZ 85344 1912620 9TH Ave, PARKER, AZ 85344 1818-307 N 40TH ST, PHOENIX, AZ 85008 1818307 N 40TH St, PHOENIX, AZ 85008 ------------------------------------------------------ 4D 664TH ST, NEW CASTLE, AL 35119 4 664TH St, NEW CASTLE, AL 35119 ------------------------------------------------------ 110 CENTER COVE I, SPICEWOOD, TX 78669 110 CENTER Cv, SPICEWOOD, TX 78669 ------------------------------------------------------ 492 STUYVESANT AVE # 4223 # 1330, IRVINGTON, NJ 07111 492 STUYVESANT Ave, IRVINGTON, NJ 07111 114 HAYES ML RD APT B122, ATCO, NJ 08004 114 HAYES ML Rd, APT, ATCO, NJ 08004 4906 LA BR APT A, HOUSTON, TX 77004 4906 LA Br, APT, HOUSTON, TX 77004 ------------------------------------------------------ 900 CITY FEDERAL BUILDING # 407, BIRMINGHAM, AL 35203 900, BUILDING # 407, BIRMINGHAM, AL 35203
--------------------------------------------------------------- 400 AVENUE I, WEST POINT, GA 31833 400 AVENUE I W, POINT, GA 31833 --------------- 19596 COUNTY ROAD 480, COLCORD, OK 74338 19596 480 Co Rd, COLCORD, OK 74338 29779 STATE HIGHWAY C BOX 974, POTOSI, MO 63664 29779 C State Hwy, POTOSI, MO 63664 10559 NE STATE HIGHWAY 90, PINEVILLE, MO 64856 10559 90 State Hwy NE, PINEVILLE, MO 64856 18208 N COUNTY ROAD 241, ALACHUA, FL 32615 18208 241 Co Rd N, ALACHUA, FL 32615 4345 ROUTE 353, SALAMANCA, NY 14779 4345 353 Rte, SALAMANCA, NY 14779 19799 STATE ROUTE O, COSBY, MO 64436 19799 O State Rte, COSBY, MO 64436 ------------------------------------------------------ 1292 NE AVENUE B, SWEETWATER, TX 79556 1292 NE Ave, SWEETWATER, TX 79556 399 WEST AVE F, JEROME, ID 83338 399 WEST Ave, JEROME, ID 83338 ------------------------------------------------------ 19126-20 9TH AVE, PARKER, AZ 85344 1912620 9TH Ave, PARKER, AZ 85344 1818-307 N 40TH ST, PHOENIX, AZ 85008 1818307 N 40TH St, PHOENIX, AZ 85008 ------------------------------------------------------ 4D 664TH ST, NEW CASTLE, AL 35119 4 664TH St, NEW CASTLE, AL 35119 ------------------------------------------------------ 110 CENTER COVE I, SPICEWOOD, TX 78669 110 CENTER Cv, SPICEWOOD, TX 78669 ------------------------------------------------------ 492 STUYVESANT AVE # 4223 # 1330, IRVINGTON, NJ 07111 492 STUYVESANT Ave, IRVINGTON, NJ 07111 114 HAYES ML RD APT B122, ATCO, NJ 08004 114 HAYES ML Rd, APT, ATCO, NJ 08004 4906 LA BR APT A, HOUSTON, TX 77004 4906 LA Br, APT, HOUSTON, TX 77004 ------------------------------------------------------ 900 CITY FEDERAL BUILDING # 407, BIRMINGHAM, AL 35203 900, BUILDING # 407, BIRMINGHAM, AL 35203
comment:7 by , 13 years ago
see CASS compare output for the most recent change set download.osgeo.org:/osgeo/download/postgis/geo_cmp_rev7646.txt
comment:8 by , 13 years ago
Milestone: | PostGIS Future → PostGIS 2.0.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
going to mark this done since have got a decent regress test suite. It just needs to be more integrated.
preliminary work at r7516