Opened 10 years ago
Last modified 7 years ago
#3103 new task
geometry_columns, find_srid performance regress tests
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Fund Me |
Component: | postgis | Version: | 2.1.x |
Keywords: | Cc: |
Description
We need to add a test for both checking correctness and performance of querying geometry_columns and using find_srid. Per issues uncovered in #3092 and #2365.
The way the planner behaves has changed over time too so simply assuming because performance is good in pg version x for a particular view it will be good in pg version whatever is fraught with problems.
Change History (11)
comment:1 by , 10 years ago
Owner: | changed from | to
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
On closer inspection the find_srid that whole splitting exercise to account for if people throw the schema name as part of the table name is a real performance killer.
If I change the function to this:
CREATE OR REPLACE FUNCTION find_srid( character varying, character varying, character varying) RETURNS integer AS $$ BEGIN RETURN srid FROM geometry_columns WHERE (f_table_schema = $1 OR $1 = '') AND f_table_name = $2 AND f_geometry_column = $3 LIMIT 1; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;
And have a table of 30,000 tests I created with this:
SELECT f_table_schema::Text, f_table_name::text, f_geometry_column::text INTO tmp_geoms from geometry_columns ; -- finishes in 1861 ms SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) FROM tmp_geoms; -- With the old find_srid that has all this error handling gook and split table name -- into schema table, I lost patience waiting
Why it's such a massive overhead on my 9.3 instance I'm not sure, but I also don't think all that splitting is worth the overhead of something people shouldn't be doing anyway. It's pretty clear to me that table name should not include
comment:5 by , 10 years ago
Nope (removing STRICT Of course so it can inline) and when I tried making it an SQL function, I ran into another problem. Since PostgreSQL can see into an SQL function, it complained that the view didn't exist when I tried to upgrade. I guess because I dropped the view at beginning. Not sure. I have to revisit that. I was thinking this function should really be SQL stable rather than IMMUTABLE.
I'm still really puzzled how that seemingly simple condition could have such major speed consequences. Makes me concerned something else is up.
comment:6 by , 10 years ago
I'd avoid requiring a DROP VIEW / CREATE VIEW if it's not for fixing a really really important issue. Upgrades are already an hell as they stand...
comment:7 by , 10 years ago
We do have to make our upgrade logic smart about when it needs to drop and when not.
For this case, would never do it in 2.1, but for 2.2 I think its worth it, just because I hate that geography_columns and geometry_columns are different and it does provide quite a bit of performance boost if you have a lot of records in geometry_columns.
comment:8 by , 10 years ago
I'd first make a deeper research about implications of varchar vs. text. Internally they should be the same (size header, variable length).
comment:9 by , 10 years ago
Possibly interesting numbers:
strk=# explain analyze select sum(x) from ( select i::varchar::float8 x from generate_series(1,5000000) i ) foo; Aggregate (cost=22.50..22.51 rows=1 width=4) (actual time=5743.902..5743.902 rows=1 loops=1) -> Function Scan on generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=1101.722..2218.290 rows=5000000 loops=1) Total runtime: 5823.127 ms strk=# explain analyze select sum(x) from ( select i::text::float8 x from generate_series(1,5000000) i ) foo; Aggregate (cost=22.50..22.51 rows=1 width=4) (actual time=5724.225..5724.225 rows=1 loops=1) -> Function Scan on generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=1106.496..2226.769 rows=5000000 loops=1) Total runtime: 5803.895 ms strk=# explain analyze select sum(x) from ( select i::varchar::text::float8 x from generate_series(1,5000000) i ) foo; Aggregate (cost=22.50..22.51 rows=1 width=4) (actual time=5757.578..5757.578 rows=1 loops=1) -> Function Scan on generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=1111.779..2228.768 rows=5000000 loops=1) Total runtime: 5836.428 ms strk=# explain analyze select sum(x) from ( select i::text::varchar::float8 x from generate_series(1,5000000) i ) foo; Aggregate (cost=22.50..22.51 rows=1 width=4) (actual time=5756.883..5756.883 rows=1 loops=1) -> Function Scan on generate_series i (cost=0.00..10.00 rows=1000 width=4) (actual time=1132.671..2249.771 rows=5000000 loops=1) Total runtime: 5835.886 ms
comment:10 by , 9 years ago
Milestone: | PostGIS 2.2.0 → PostGIS Future |
---|
Committed at r13453 change to no longer cast to varchar. I didn't see a difference with find_srid (totally illogical). find_srid even in my old view took 20 ms (even though direct query on view was taking 200 ms with old view). so more tests required.