#3382 closed enhancement (fixed)
Avoid deserializing small geometries during index operations
Reported by: | dbaston | Owned by: | pramsey |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.3.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description
In doing some work on #75, I decided to profile some simple point-in-polygon queries using Callgrind. I was surprised to see that much of the time was being taken by gserialized_gist_2d.c, around line 565. Since points aren't carrying a box, this code path was doing a lot of work deserializing points to compute a using generic routines.
However, a more efficient approach for these corner cases has already been implemented in the gserialized_peek_gbox_p function. Replacing the current deserialization code with a call to gserialized_get_gbox_p allows gserialized_gist_2d to take advantage of the more efficient routines without adding any complexity.
In my testing this is bringing about a 30-40% performance increase in point-in-polygon joins.
Attachments (1)
Change History (16)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
pramsey do you have issue with dbaston just committing this work to 2.3? I'm itching to test out a 30-40% improvement but too lazy to pull another repo to do it. :).
comment:3 by , 9 years ago
Updated the PR with an some cunit tests and an additional bugfix for gserialized_peek_gbox_p. I can pull that stuff out into a separate ticket/patch if that's preferable.
History in the PR is cluttered by a merge (there has to be a better way), but you can see the actual changes proposed at https://github.com/postgis/postgis/compare/svn-trunk...dbaston:trac-3382
comment:4 by , 9 years ago
You can avoid a merge by rebasing your changes on top of trunk:
git rebase github/svn-trunk
Then you'll have to force-push to the branch associated with your PR (not sure what it is)
git push -f # assuming it'll push to the appropriate branch
comment:5 by , 9 years ago
I had to rebase my repo for another project working on and found this article.
https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request
In my case it was fairly simple since i just wanted to synch after my pull request was accepted. Seems you are missing a git remote line
So my command looked like
git clone https:/github...your_repo somefolder cd somefolder git remote add somename_base https://github/the_master_url git fetch somename_base git rebase somename_base/svn-trunk git push
comment:6 by , 9 years ago
That's a fantastic link, thank you!
Hopefully @pramsey can weigh in on this one before another sync becomes necessary :)
comment:7 by , 9 years ago
I really cannot see the relevant changeset in the pull request, unfortunately, just the new tests and the fix for the multipoint(empty) case (good catch). Just go for it, you can always pick up the pieces if something breaks (yee haw).
comment:8 by , 9 years ago
Committed to trunk at r14478.
robe, if you see a decent performance benefit from this, let me know. It seems like it would be pretty simple to extend the "gserialized_peek" function that's behind this to handle geography points as well. (The gains I was seeing were in counting intersections between 20,000 polygons -- TIGER faces for FIPS 25001 -- and 100,000 random points selected from the extent of that dataset.)
comment:9 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
follow-up: 12 comment:10 by , 9 years ago
Don't try to extend this to geography. The bounding box of a two-point line in geography may not be just the line's extrema.
comment:12 by , 9 years ago
Replying to pramsey:
Don't try to extend this to geography. The bounding box of a two-point line in geography may not be just the line's extrema.
Understood, but this should be safe and simple to do for geography points, no?
comment:13 by , 9 years ago
The point geographic coordinates ahve to be converted to geocentric before they are a true "geography bounding box" but yes(ish). I'm not actually sure if we store boxes on geography points or not.
comment:14 by , 9 years ago
I think my commit at #3361 made it so that we never store boxes for geography points. I'd have to verify. Before it wasn't consistent.
by , 9 years ago
Attachment: | callgrind.out.22131.gz added |
---|
callgrind output for geography point-in-poly join
comment:15 by , 9 years ago
I ran a geography version of the same test scenario. It looks like same issue is present (deserializing to compute a box), but it's much less significant, around 10% of query time. It looks like the relevant place to change would be in libpgcommon/gserialized_gist.c:289. Of course, you still have to compute the box, so maybe you could bring it down to 6-8%...doesn't quite seem worth it.
I attached a copy of my callgrind output if anyone's interested. I recently discovered kcachegrind, which is a really nice viewer for these files.
Pull request at https://github.com/postgis/postgis/pull/76
This contains an unrelated fix for a glitch in gserialized_peek_gbox_p, which caused a garbage box to be computed for MULTIPOINT(EMPTY), making the WKT regression test fail.