Opened 4 years ago

Closed 4 years ago

#4737 closed enhancement (fixed)

Reduce memory usage during MVT encoding

Reported by: Algunenano Owned by: Algunenano
Priority: medium Milestone: PostGIS 3.1.0
Component: postgis Version: master
Keywords: Cc:

Description

During the last months I've received multiple reports of excesive memory usage of Postgres during MVT generation.

Although these situations are extreme, as MVTs are not meant for retrieval of dozens or even hundreds of MBs at one, they are hard to deal with since it's normal for the OOM killer to do its job and kill the leader backend id, which triggers a database restart.

I'm still investigating how the memory contexts work during the aggregation (since some memory seems to not go away), but one thing I've discovered is that the current (spec'd) proto is not very memory efficient as it's declaring message Value as being able to hold values of different types at once (so it can hold a string, a float, a double, an int, ..., all at the same time). This is prohibited by the spec, but it's just a comment in the .proto so it's not enforced. Since protobuf-c 1.1+, you can use the oneof clause to avoid this, which trasnsforms the resulting C code into a union, which is way more efficient in terms of memory. I've tested this and I've seen a 20-25% decrease in memory usage when generating them (encode_values).

What I still not sure about is the best way to apply this. I can either change the .proto and raise the minimum requirement for protobuf-c, or I can generate the resulting .c|.h and push them in the repo (so they are stable and not generated every time).

Attachments (3)

master.png (33.6 KB ) - added by Algunenano 4 years ago.
Current behaviour
protoc8.png (31.1 KB ) - added by Algunenano 4 years ago.
Behaviour with the changes
mvtprop_master_vs_mvtprop_pbfvalues13491.pdf (71.7 KB ) - added by Algunenano 4 years ago.
Performance comparison. Note than the MVTMAX requests include ~40 columns, while MVTMIN only one (apart from the geometry)

Download all attachments as: .zip

Change History (5)

by Algunenano, 4 years ago

Attachment: master.png added

Current behaviour

by Algunenano, 4 years ago

Attachment: protoc8.png added

Behaviour with the changes

comment:1 by Algunenano, 4 years ago

Some extra improvements on top of this:

If we use protobuf classes instead of our own in the hash tables. Instead of doing an intermediate class and later create a VectorTileTileValue, use VectorTileTileValue directly. No noticeable impact in memory usage (expected) but I do see an impact on performance, an extra ~7% improvement on top of the previous improvement (5-10%) from using less memory.

Also, for parallel queries there are a couple of interesting changes:

  • Use a temporal memory context (child of the aggregation context) to process rows (in pgis_asmvt_transfn). This context is then deleted when pgis_asmvt_serialfn is called as we no longer need all the extra buffer. This ensures a cleanup is done and helps reduce the memory in the main/leader backend during the intermediate steps.
  • When combining tiles, do shallow copies and not full copies. Since both sources (mvt_agg_context) and destination (also a mvt_agg_context) live in the same memory context (aggcontext) we can keep pointers to them so long we don't free the sources before we generate the final buffer.

I have everything more or less done and I need to do some more testing (mainly because we don't have parallelism test for mvts so I've done them manually for now) but with all the changes together for a really big MVT:

  • Before: Query time: ~5.3s. Peak memory usage¹: ~6GB
  • Now: Query time: ~3.7s. Peak memory usage¹: ~3.6GB

Attaching the current measurements (master == current, protoc8 == with the changes described in the description and this comment).

1 - The memory usage is sum of the memory ussed by all active PG backends done via psrecord.

by Algunenano, 4 years ago

Performance comparison. Note than the MVTMAX requests include ~40 columns, while MVTMIN only one (apart from the geometry)

comment:2 by Raúl Marín <git@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 99c50d46/git:

MVT: Improve performance and memory usage

  • Reduce memory usage (raises protobuf-c requirement to 1.1).
  • Improves performance when merging tiles (from parallel queries).
  • Uses protobuf values to reduce the number of transformations and memory usage.
  • Uses a temporal context when transforming data (reading rows) to have a fast way of cleaning up memory before the parallel process starts.
  • Processes texts and cstrings with each known reading functions instead of the OidOutputFunctionCall fallback.
  • Avoids multiple hash function calls when a value is not found in the hash table (once for the lookup and once for the insertion).

Closes https://github.com/postgis/postgis/pull/576
Closes #4737

Note: See TracTickets for help on using tickets.