4 | | https://github.com/postgres/postgres/commit/f6f61d937bfddbe2a5f6a37bc26a0587117d7837 |
| 4 | https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f6f61d937bfddbe2a5f6a37bc26a0587117d7837 |
| 5 | |
| 6 | |
| 7 | {{{ |
| 8 | author Andrew Gierth <rhodiumtoad@postgresql.org> |
| 9 | Tue, 28 Aug 2018 04:52:25 -0400 (09:52 +0100) |
| 10 | committer Andrew Gierth <rhodiumtoad@postgresql.org> |
| 11 | Tue, 28 Aug 2018 06:55:18 -0400 (11:55 +0100) |
| 12 | commit f6f61d937bfddbe2a5f6a37bc26a0587117d7837 |
| 13 | tree cec65e55e1c5f85a32c2421b8c17dc72327e9a0d tree | snapshot |
| 14 | parent 0f3dd76f527deb81ee5ba60048df04c598c93960 commit | diff |
| 15 | Avoid quadratic slowdown in regexp match/split functions. |
| 16 | |
| 17 | regexp_matches, regexp_split_to_table and regexp_split_to_array all |
| 18 | work by compiling a list of match positions as character offsets (NOT |
| 19 | byte positions) in the source string. |
| 20 | |
| 21 | Formerly, they then used text_substr to extract the matched text; but |
| 22 | in a multi-byte encoding, that counts the characters in the string, |
| 23 | and the characters needed to reach the starting byte position, on |
| 24 | every call. Accordingly, the performance degraded as the product of |
| 25 | the input string length and the number of match positions, such that |
| 26 | splitting a string of a few hundred kbytes could take many minutes. |
| 27 | |
| 28 | Repair by keeping the wide-character copy of the input string |
| 29 | available (only in the case where encoding_max_length is not 1) after |
| 30 | performing the match operation, and extracting substrings from that |
| 31 | instead. This reduces the complexity to being linear in the number of |
| 32 | result bytes, discounting the actual regexp match itself (which is not |
| 33 | affected by this patch). |
| 34 | |
| 35 | In passing, remove cleanup using retail pfree() which was obsoleted by |
| 36 | commit ff428cded (Feb 2008) which made cleanup of SRF multi-call |
| 37 | contexts automatic. Also increase (to ~134 million) the maximum number |
| 38 | of matches and provide an error message when it is reached. |
| 39 | |
| 40 | Backpatch all the way because this has been wrong forever. |
| 41 | |
| 42 | Analysis and patch by me; review by Kaiting Chen. |
| 43 | |
| 44 | Discussion: https://postgr.es/m/87pnyn55qh.fsf@news-spur.riddles.org.uk |
| 45 | |
| 46 | see also https://postgr.es/m/87lg996g4r.fsf@news-spur.riddles.org.uk |
| 47 | |
| 48 | }}} |
| 49 | |
| 50 | This was committed to 9.3-12 stable branches. |