Opened 12 years ago

Closed 3 years ago

Last modified 2 years ago

#2083 closed enhancement (fixed)

ST_RemEdgeModFace performance review

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS 3.3.0
Component: topology Version: 2.0.x
Keywords: performance Cc:

Description

ST_RemEdgeModFace queries the relation table to see if a face can be destroyed without affecting existing TopoGeometry objects. The query does not find any usable index because the only index we have is on all the columns and that query isn't using a match on all the column.

We could do two things:

  1. See if it's possible to rewrite the query to make use of the index
  2. Add an additional index on the table

Change History (27)

comment:1 by strk, 12 years ago

Another thing to do could be moving the check for TopoGeometry sanity before the update calls, as it should be much faster and it's useful to be able to just try to drop an edge and see if it succeeded or not based on presence of a TopoGeometry..

comment:2 by aperi2007, 12 years ago

I guess is better to add another index. However if the choos is to rewrite the query this is the opportunity to remove the constraint to remove other face only if this not destroyed other objects but support instead the inherit the change to other objects.

comment:3 by strk, 12 years ago

Supporting changes to higher TopoGeometry objects would require a more complex set of parameters because there are different things you may want to do. In particular, assuming there are only two non-overlapping TopoGeometry objects separated by the edge you're about to remove you may want to either destroy one of the two TopoGeom and make the other take the whole space (which one, in that case?) or you may want both TopoGeoms to take the whole space , or you may want to destroy both.

This ticket is for improving the performance, if you have ideas on semantic changes it is better to discuss them elsewhere.

I do feel the limit imposed by TopoGeometries, but I consider it a safeguard. What's really needed now, IMHO, are easy TopoGeometry editing function (addElement/dropElement).

comment:4 by robe, 12 years ago

Milestone: PostGIS 2.1.0PostGIS 2.2.0

comment:5 by strk, 10 years ago

Another function that would benefit from an additional index to the relation table is the ST_ModEdgeSplit one, as reported on the mailing list by a user: http://lists.osgeo.org/pipermail/postgis-users/2015-March/040326.html

comment:6 by Lars Aksel Opsahl, 10 years ago

Hi

After adding the index topo_ar5_sysdata.relation(abs(element_id)) the performance is much better for a single select the time goes from more than 3000 ms to around 20-40 ms. In the layer we are testing the topo_ar5_sysdata.relation which contains 27794698 rows. For the end user the update time goes down from 30 seconds to about 1 second for a user that that is drawing simple polygon to update this layer.

Her is the explain with the functional index added

explain analyze SELECT r.* FROM topo_ar5_sysdata.relation r, topology.layer l WHERE l.topology_id = 21 AND l.level = 0 AND l.layer_id = r.layer_id AND abs(r.element_id) = 30145 AND r.element_type = 2 ;

QUERY PLAN


Hash Join (cost=1450.51..64998.02 rows=100884 width=16) (actual time=0.061..0.066 rows=1 loops=1)

Hash Cond: (r.layer_id = l.layer_id) -> Bitmap Heap Scan on relation r (cost=1449.96..63610.31 rows=100884 width=16) (actual time=0.027..0.032 rows=1 loops=1)

Recheck Cond: (abs(element_id) = 30145) Filter: (element_type = 2) Rows Removed by Filter: 1 -> Bitmap Index Scan on opo_ar5_sysdata_relation_abs_element_id_idx (cost=0.00..1424.74 rows=138973 width=0) (actual time=0.019..0.019 rows=2 loops=1)

Index Cond: (abs(element_id) = 30145)

-> Hash (cost=0.53..0.53 rows=2 width=4) (actual time=0.023..0.023 rows=2 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on layer l (cost=0.00..0.53 rows=2 width=4) (actual time=0.016..0.019 rows=2 loops=1)

Filter: ((topology_id = 21) AND (level = 0))

Total runtime: 0.126 ms

(13 rows)

To avoid extra index I may also use a or, but that only works a long as most of the element_id > 0. I have not seen any rows with a value below zero yet.

<
' AND abs(r.element_id) = e.edge_id';

---

' AND ((r.element_id >-1 AND r.element_id = e.edge_id) OR ( r.element_id < 0 AND abs(r.element_id) = e.edge_id))';

I go for the functional index now and that may also cover other cases that I have not seen in the logs yet.

comment:7 by strk, 10 years ago

If by "extra index" you mean the only one added to the current one, you cannot save that. Queries exist that just check for the existance of any TopoGeometry using a specific element_id, so using OR would not help there.

comment:8 by strk, 10 years ago

Milestone: PostGIS 2.2.0PostGIS Future

comment:9 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

comment:10 by strk, 3 years ago

Keywords: performance added

comment:11 by strk, 3 years ago

In a case I'm looking at I confirm that adding an index on abs(relation.elem_id) improves the speed. In my case from 64 seconds to 13 seconds to remove 7207 unused edges.

Related ticket: #4887

comment:12 by strk, 3 years ago

I'm thinking having a function to remove multiple edges at once could be useful, in general, to reduce the cost of such operation. Such function could use a single query to verify removing the edges would not represent a problem with TopoGeometries and eventually could update all edge linking and side faces at once.

comment:13 by strk, 3 years ago

I'm digging further in this, executing each query executed internally by ST_RemEdgeModFace. The first query looks like this:

select
 r.topogeo_id, r.layer_id, l.schema_name, l.table_name, l.feature_column
FROM  topology.layer l
INNER JOIN <TOPONAME>.relation r
ON (l.layer_id = r.layer_id)
WHERE l.level = 0 AND l.feature_type IN ( 2, 4 )
AND l.topology_id = :topology_id
AND r.element_type = 2 AND abs(r.element_id) = :edge_id;

The conditions used on the relation table are: layer_id, element_type, element_id.

The index is created with fields: layer_id, topogeo_id, element_type, element_id -- this makes literal (query) values only usable in the order given, so since we do NOT have a 'topogeo_id' literal, the only usable literal is 'layer_id', which in the case I'm looking at matches *all* records, thus the planner decides to use a sequential scan.

If the unique index was created with reordered fields (layer_id, element_type, element_id, topogeo_id) it would then be possibly usable for filtering on element_type, and if we changed the abs(r.element_id) to an r.element_id in (10, -10) we'd have the possibility to also make juse of the element_id in the filter. But reordering might slow down other uses (fetching elements of a known topogeometry) so this is not the best way to proceed.

Rewriting this query is thus NOT a viable solution.

A new index could serve queries searching for TopoGeometry objects using a given element of the topology. Elements of a topology are always identified by the tuple ( layer_id, element_type, element_id ) because the *same* element_id may have different meanings based on element_type and layer_id. Among the three fields, element_id is probably always the most selective.

The next query, only performed if the edge to be removed has left_face != right_face, looks like this:

SELECT t.* FROM (
  SELECT
    r.topogeo_id,
    r.layer_id,
    l.schema_name,
    l.table_name,
    l.feature_column, 
    array_agg(r.element_id) as elems
  FROM topology.layer l 
  INNER JOIN "${TOPO}".relation r ON (l.layer_id = r.layer_id)
  WHERE l.level = 0 and l.feature_type IN (3, 4)
  AND l.topology_id = id(findTopology('${TOPO}'))
  AND r.element_type = 3
  AND r.element_id = ANY (ARRAY[${LEFT_FACE}, ${RIGHT_FACE}]::int4[])
  GROUP by
    r.topogeo_id, r.layer_id, l.schema_name,
    l.table_name, l.feature_column
) t
WHERE NOT t.elems @> ARRAY[${LEFT_FACE}, ${RIGHT_FACE}]::int4[];

The conditions used on the table are: element_type and element_id, no layer_id in this case as we're looking for faces in any non-hierarchical layer.

comment:15 by strk, 3 years ago

The next slow part is updating nodes containing in face, which is suffering from lack of index on node.containing face (see #2861)

comment:16 by Sandro Santilli <strk@…>, 3 years ago

Resolution: fixed
Status: newclosed

In 854103f/git:

Add index on relation(element_id) and use it from RemEdge functions

Closes #2083

comment:17 by Lars Aksel Opsahl, 3 years ago

Here is the list of all index I added when running resolve overlap (I will remove code when as upgrade to new Postgis versions)

  EXECUTE Format('CREATE INDEX ON %s.relation(layer_id)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.relation(abs(element_id))', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.relation(element_id)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.relation(topogeo_id)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.edge_data USING GIST (geom)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.edge_data(abs_next_left_edge)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.edge_data(abs_next_right_edge)', (_topology_info).topology_name);
  EXECUTE Format('CREATE INDEX ON %s.node(containing_face)', (_topology_info).topology_name);


comment:19 by strk, 3 years ago

Resolution: fixed
Status: closedreopened

relation(layer_id) should not be needed because you usually don't have those many layers and anyway the btree (layer_id, topogeo_id, element_id, element_type) index would be already used when a query using layer_id is issued.

The abs(element_id) I removed the need for in [854103fa2e087b53485c8cf403a1fcb952146be2/git]

relation(topogeo_id) would also be covered by the existing btree on all members, as topogeo_id is second member after layer_id, and you always search for topogeo_id togheter with layer_id.

The edge_data(geom) is already created.

The node(containing_face) I pushed already with [e4495fb792fe31a87a84e92297d276baf254007a/git]

The only left over are the indices on abs_next_{left,right}_edge which I guess would be useful upon deleting edges, to re-link the edges pointing to the one being removed. I'm reopening the ticket to research upon this one.

comment:20 by strk, 3 years ago

Confirmed: those triggers are deferred so you don't see the cost of firing them until you COMMIT your transaction or force firing them (set constraint immediate). The cost of those triggers is very very high, especially during mass removal of edges.

Given we have a good ValidateTopology I wonder if we should completely drop those foreign keys (and other constraints as well).

Here are some times (seconds elapsed since start of process). The operation is conducted against a topology having

Without the index on the abs_* attributes:

0 psql:clearonly.sql:50: NOTICE: Removing unused
0.590315 psql:clearonly.sql:50: NOTICE: Removed 178
0.590379 psql:clearonly.sql:50: NOTICE: Removing isolated
0.622091 psql:clearonly.sql:50: NOTICE: Removed 127
0.622126 psql:clearonly.sql:50: NOTICE: Removing unneeded
0.624795 psql:clearonly.sql:50: NOTICE: Removed 0
0.624816 305   
0.624918 Firing next_left_edge_exists  
249.487 SET CONSTRAINTS  
249.487 Firing next_right_edge_exists  
508.951 SET CONSTRAINTS  

We're basically talking about under one second to remove 178 edges and 127 nodes (which are basically merging of 127 pairs of edges, mostly. And over 8 MINUTES to check the foreign keys.

Adding those indexes give instead these numbers:

0 psql:clearonly.sql:53: NOTICE: Removing unused
0.795837 psql:clearonly.sql:53: NOTICE: Removed 178
0.795877 psql:clearonly.sql:53: NOTICE: Removing isolated
0.827427 psql:clearonly.sql:53: NOTICE: Removed 127
0.827465 psql:clearonly.sql:53: NOTICE: Removing unneeded
0.829889 psql:clearonly.sql:53: NOTICE: Removed 0
0.829908 305   
0.830004 Firing next_left_edge_exists  
0.843845 SET CONSTRAINTS  
0.843973 Firing next_right_edge_exists  
0.859349 SET CONSTRAINTS 

Which means foreign keys check goes down from 8 minutes to 0.3 seconds. Worth adding those indices I'd say.

comment:21 by Lars Aksel Opsahl, 3 years ago

Very good, but could that indicate a missing index ?

That has been problem in some other cases I have had this problem I think.

comment:22 by strk, 3 years ago

Yes, as you said, the index on the "abs" version of next left/right edge was missing. Will be added with https://gitlab.com/postgis/postgis/-/merge_requests/78

comment:23 by strk, 3 years ago

That missing index ONLY affects edge DELETION. Updates are not affected by lack of it (but I guess can be slightly slower as they need to also update the additional index)

comment:24 by Lars Aksel Opsahl, 3 years ago

Not sure but I trust constraints more than function that maybe not be called .

Can it be the way the constraints are defined ?

comment:25 by Sandro Santilli <strk@…>, 3 years ago

Resolution: fixed
Status: reopenedclosed

In d45a982/git:

Add indices on edge_data.abs_next_{left,right}_edge

This speeds up foreign key constraints checking upon
removal or editing of edges.

See https://trac.osgeo.org/postgis/ticket/2083#comment:20

Closes #2083 again

comment:26 by strk, 3 years ago

Not sure but I trust constraints more than function that maybe not be called .

Well, we know the current constraints are NOT covering all cases of invalid topologies, so they may be just giving a false sense of security. Making them cover all cases would probably make them prohibitively slow, but it could be an idea to experiment with (better discuss on the mailing list, this ticket is not the right place).

comment:27 by robe, 2 years ago

Milestone: PostGIS Fund MePostGIS 3.3.0
Note: See TracTickets for help on using tickets.