Opened 10 years ago
Last modified 7 years ago
#2998 new enhancement
Define a strategy to ensure stats are collected during topology building
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Fund Me |
Component: | topology | Version: | master |
Keywords: | performance | Cc: |
Description
It is very likely that the queries used during topology building are inefficient due to lack of statistics enabling use of indexes.
Statistics are not updated until transaction commit, so this is a problem for queries like:
UPDATE mycoverage SET topogeom = toTopoGeom(geom);
This ticket is to define a strategy to run ANALYZE during these kind of operations, in a somewhat automatic but not invasive way.
Change History (4)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Note that it is not necessarely true that indexes won't be used due to lack of stats, as the estimator makes some guesses on itself, and the invoked type-specific selectivity estimators may also behave in unexpected ways.
In this example, passing 99 geometries to toTopoGeom in a single statement resulted in these operations on the "edge_data" table:
=# select * from pg_stat_all_tables where relname = 'edge_data' and schemaname = 'topo_ulfareale'; relid | 16228229 schemaname | topo_ulfareale relname | edge_data seq_scan | 417 seq_tup_read | 28976 idx_scan | 3438 idx_tup_fetch | 3170 n_tup_ins | 175 n_tup_upd | 365 n_tup_del | 0 n_tup_hot_upd | 65 n_live_tup | 175 n_dead_tup | 365 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 0
As you can see there was no analyze ever run, still there have been ~3.5k index scans and "only" 417 sequencial scans. For just 99 inputs it's much over too many scans in general, but shows that index scans still happen.
comment:3 by , 10 years ago
Keywords: | performance added |
---|
So one idea is for functions like ST_AddEdgeNewFaces, ST_AddEdgeModFaces, ST_AddEdge and the like to all use a common "edge adder" function which would keep track of how many primitives are added and run ANALYZE <table> after every <X> additions in <table>. Since every primitive table has an id, driven by a sequence, it could be relatively inexpensive to do, using a modulo operation.