#4543 closed enhancement (fixed)
Use Ryū to output floating point numbers
Reported by: | Algunenano | Owned by: | Algunenano |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.1.0 |
Component: | liblwgeom | Version: | master |
Keywords: | Cc: |
Description
PG12 introduced an implementation of Ryū (https://dl.acm.org/citation.cfm?id=3192369) to speed up the transformation between floating points and strings, as it can be 10x faster than a straight sprintf(str, "%f", double)
.
It seemed to me that multiple Postgis' functions could use a similar improvement so I've tested nasty hack to use Postgres' RYU implementation for lwprint_double
(this is a hack, so it doesn't take into account desired precision or space left in the buffer):
Before:
explain analyze Select ST_AsText(the_geom) from benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020 (cost=0.00..33.63 rows=13 width=32) (actual time=547.194..5313.276 rows=13 loops=1) Planning Time: 0.144 ms Execution Time: 5313.322 ms (3 rows)
After:
Seq Scan on benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020 (cost=0.00..33.63 rows=13 width=32) (actual time=64.062..478.339 rows=13 loops=1) Planning Time: 0.628 ms Execution Time: 478.373 ms (3 rows)
The hack involves using PG function in liblwgeom, so that's a no-go from the start, but it's useful to compare performance and see what kind of improvement we could expect to get if we decided to go that way:
diff --git a/liblwgeom/lwprint.c b/liblwgeom/lwprint.c index af56c4c27..1a0dc886d 100644 --- a/liblwgeom/lwprint.c +++ b/liblwgeom/lwprint.c @@ -486,27 +486,13 @@ trim_trailing_zeros(char* str) * truncated and misses a terminating NULL. * */ +/* This is also provided by snprintf.c */ +extern int double_to_shortest_decimal_bufn(double f, char *result); + int lwprint_double(double d, int maxdd, char* buf, size_t bufsize) { - double ad = fabs(d); - int ndd; - int length = 0; - if (ad <= FP_TOLERANCE) - { - d = 0; - ad = 0; - } - if (ad < OUT_MAX_DOUBLE) - { - ndd = ad < 1 ? 0 : floor(log10(ad)) + 1; /* non-decimal digits */ - if (maxdd > (OUT_MAX_DOUBLE_PRECISION - ndd)) maxdd -= ndd; - length = snprintf(buf, bufsize, "%.*f", maxdd, d); - } - else - { - length = snprintf(buf, bufsize, "%g", d); - } - trim_trailing_zeros(buf); - return length; + int b = double_to_shortest_decimal_bufn(d, buf); + buf[b] = 0; + return b; } \ No newline at end of file
Change History (8)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Some updates:
- I have ryu now integrated inside postgis (under deps) so it builds and links without the need of anything external.
- I've found some inconsistencies between what lwprint_double says it does with maxdd and what it actually does. Changing this breaks some tests but I think it's ok.
- Ryu's scientific notation output doesn't trim extra zeros, so it might output
1.00000000e+100
instead of1e+100
I'm not sure whether I want to try to fix it or just use snprintf (as it is) since those big numbers are rare on GIS. - I've started working on improving other parts of the print stack to keep improving the performance. I'm not sure if I'll continue down this path or move into fixing the broken tests (either by accepting the output or by changing the code).
Comparison of the current status:
- ST_AsText with big geometries:
explain analyze Select ST_AsText(the_geom) from benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020;
- Before: 5166.606 / 5220.705 / 5218.330
- After: 715.381 / 713.122 / 713.993
- ST_AsGeoJson with big geometries:
explain analyze Select ST_AsGeoJson(the_geom) from benchmark_4c7214d90a79aa6760367a084a4d4a2f61fbe1c6cc4f7f9e76020;
- Before: 4738.816 / 4729.487 / 4810.050
- After: 1057.564 / 1048.442 / 1062.282
- ST_AsText with points (3 + 1 workers):
explain analyze Select ST_AsText(the_geom) from yellow_tripdata_2015_07_1m;
- Before: Before: 610.948 / 606.455 / 602.095
- After: 274.195 / 273.759 / 279.207
- ST_AsGeoJson with points (3 + 1 workers):
explain analyze Select ST_AsGeoJson(the_geom) from yellow_tripdata_2015_07_1m;
- Before: 581.969 / 580.237 / 582.805
- After: 320.685 / 316.013 / 320.374
comment:3 by , 5 years ago
Working PR with minimal output changes: https://github.com/postgis/postgis/pull/523
comment:4 by , 5 years ago
Thanks for introducing Ryu, as I haven't seen this yet. It appears to be a possible successor to Google's double-conversation.
Another good resource on the same topic is from Ryan Juckett, with an implementation called Dragon4, which is now used by numpy to format floats to positional or scientific notation strings. It's a shame there is no direct comparison between Dragon4 and Ryu, although I will say the publication for Ryu presents itself well.
I've made another integration doing a simpler hack and using upstream ryu printf's implementation and I get way less changes (some differences in the exponent version that I still need to have a look at) and good performance:
Before:
After:
This new version is slower than the original hack but I've yet to investigate why. One possibility is that the original hack never used the exponential output, but that isn't an option for us AFAIK, right?