Opened 10 years ago
Last modified 7 years ago
#2898 new enhancement
Define a guideline for function cost
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Fund Me |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Change History (10)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
The guideline example above is completely off, though. Here are some real numbers:
noop : Total runtime: 19523.492 ms (full deserialize/serialize) st_envelope : Total runtime: 18411.870 ms (header slice) st_convexhull : Total runtime: 18104.542 ms plain : Total runtime: 51.341 ms (???)
The times were taken running queries against a ~180k rows raster coverage. The "plain" is a "SELECT rast FROM ...". The "noop" is a "SELECT postgis_noop(rast) FROM ...".
The difference between "plain" and "noop" is impressive!!
comment:3 by , 10 years ago
If I change the postgis_noop implementation to simply return the input datum it takes 66ms (vs. ~50ms of the plain timing). Having it simply do the DETOAST_DATUM makes it jump again to 18149.952 ms
It turns out EXPLAIN ANALYZE does not run canonical output functions at all, so there's no detoasting performed in a "SELECT rast FROM ..."
comment:4 by , 10 years ago
Alright so in the light of the new findings, sliced detoasting (header only) and full detoasting do not make such a big difference. It is detoasting itself that hurts. Guidelines should then not take that part much in consideration, other than a COST offset common to each and every function.
Besides, I suspect that the planner should be considering cost of detoasting on itself, as the same function could be invoked with the input already detoasted...
comment:5 by , 10 years ago
17:37 < RhodiumToad> the planner doesn't take it into account. 17:38 < RhodiumToad> currently, functions that may be highly expensive due to detoasting (to_tsvector being a notorious example) often need to be assigned large costs to avoid bad plans
comment:6 by , 10 years ago
The numbers you're seeing generally match what I've seen with in-db rasters on production servers. For basic ops, the detoast is the costliest.
comment:7 by , 10 years ago
The question is should we consider detoasting costs in the function definition or shouldn't we ? I'd think the planner should consider them itself as we cannot know its cost at definition time for two reasons:
- It depends on input (toast) size
- It depends on whether the input gets to us already detoasted
comment:8 by , 10 years ago
Owner: | changed from | to
---|
From the commentary, it sounds like this one is not as "straightforward" as one might hope, if input data size is the primary determinant of cost. Maybe close.
comment:9 by , 10 years ago
Milestone: | PostGIS 2.2.0 → PostGIS Future |
---|
Example guideline for rasters: