Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#4330 closed patch (fixed)

postgis_restore fails when output piped to an intermediate process

Reported by: hranalli Owned by: strk
Priority: medium Milestone: PostGIS 3.0.0
Component: build Version: 2.5.x -- EOL
Keywords: Cc:

Description

We are upgrading a database from Postgres 8.2.23 and PostGIS 1.4 to PostgreSQL 11.2 and PostGIS 2.5. As part of our process, we pipe the output of postgis_restore.pl to sed, whic makes some changes to the SQL before it is piped to psql.

When we do this, postgis_restore buffers the output in memory, and then is killed by the OS when it runs out of memory. I've created a patch that eliminates this buffering in the places where it is unnecessary (the culprit is in the TABLE DATA commands, as these are the most memory intensive).

Once postgis_restore is run with the patch in place, it does a fine job of upgrading the database.

Attachments (1)

postgis_restore-print-lines-as-processed.patch (1.6 KB ) - added by hranalli 6 years ago.
Patch to output statements as they are processed, rather than buffering and running out of memory

Download all attachments as: .zip

Change History (11)

by hranalli, 6 years ago

Patch to output statements as they are processed, rather than buffering and running out of memory

comment:1 by robe, 6 years ago

strk can you look over this patch and see if you are okay with it.

comment:2 by strk, 6 years ago

I'm ok with the autoflush setting but don't see why there was a need to change other lines. Hranalli ? Is autoflush only enough ?

comment:3 by hranalli, 6 years ago

Unfortunately, the autoflush setting is not enough. That is what I tried first, and figured it didn't hurt to leave it in, but it did not solve the problem. It was necessary to output lines on each loop iteration, rather than letting them build up, in order to resolve the issue.

comment:4 by strk, 6 years ago

I see the change is in the block parsing CREATE TABLE and in the one parsing COMMENT ON How do they affect the TABLE DATA block that you reference in the original submission ?

I suspect the problem is elsewhere. Can it be the lack of anchor in CREATE TABLE pattern ? May it be it is catching TABLE DATA values ?

comment:5 by hranalli, 6 years ago

It's been a while since I made this patch, as I didn't want to submit it until after several test runs, but I am pretty sure the CREATE TABLE statement also processes the table data. To figure out the problem, I added logging to see where it crashed. And it was in the middle of the CREATE TABLE loop that it ran out of memory. The change I made there resolves the problem. I've run the script several times on our database, and it always fails without the patch, and succeeds with it.

I adjusted the COMMENT area because, as there was no intermediate processing required, it seemed reasonable to remove a similar, potential failure point (though I agree an unlikely one, given the data sizes involved). Other sections which buffer data need the buffer to make decisions, and so would be harder to adjust.

You can certainly leave out the patch on the COMMENTS section. I do know the patch on CREATE TABLE solves the problem. There may be a more fundamental issue, but I'm not a Perl programmer or PostGIS expert.

Version 0, edited 6 years ago by hranalli (next)

comment:6 by strk, 6 years ago

Resolution: fixed
Status: newclosed

In 17320:

Force early flushing of postgis_restore.pl output

Reduces memory usage when piping output to further processing filter.

Patch by Hugh Ranalli

Closes #4330

comment:7 by strk, 6 years ago

Ok, I applied the patch with r17318, in 3.0 branch. I'm betting on having enough time for others to scream if it breaks anything :)

comment:8 by robe, 6 years ago

strk so you want to apply to 2.5.2 or just stick with 3.0. You closed but didn't flip the milestone to 3.0 so not sure.

comment:9 by robe, 6 years ago

Milestone: PostGIS 2.5.2PostGIS 3.0.0

strk I'm flipping this to 3.0. If you do back commit to 2.5, then flip to 2.5.2.

I'll release 2.5.2 later tonight or tomorrow.

comment:10 by strk, 6 years ago

I feel the postgis_restore.pl path is not very well tested so any change should really be tested deeply manually before backporting. Can't remember if we have a ticket open about adding such testing to our many bots, would be great to have.

Note: See TracTickets for help on using tickets.