Changes between Version 17 and Version 18 of ImplementSortingMethodsBeforeGistIndexBuilding
- Timestamp:
- 08/23/21 06:49:35 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ImplementSortingMethodsBeforeGistIndexBuilding
v17 v18 127 127 GiST(Generalized Search Tree) is a generalization data structure of a variety of disk-based height-balanced search trees. Under the high-level API of GiST, structures like b-tree, r-tree can be implemented for data management. PostgreSQL defines a set of process function APIs for elements of the GiST index. Only with these function implementations can a data type be indexed and managed by a GiST structure. In large data scenarios, pre-sorting a batch of data fetched in memory may be a local approximation to the global sorting method. Recent PostgreSQL patch shows that it should speed up the build of a GiST index after some pre-sorting of the data which needs to be indexed. In one fork, the author replaces the GIST_OPTIONS_PROC with GIST_ORDER_PROC to try to define an order for data fetched in memory to sort in order to speed up the subsequent index building process. And I implemented pre-sorting methods in z-order pattern and Hilbert order pattern, Alos tested and compared pre-sorting methods on various data. 128 128 129 **The state of the art BEFORE your GSoC** 130 131 The index building process does not change the tuple order in the page and run in a slow speed 132 133 **The addition value** 134 With the pre-sorting index, the time of building index reduce to the to one-third to one-fifth of the original 135 129 136 **Links** 130 137 … … 144 151 * Implement a fast Morton/Hilbert hash function for n-dimension geometry objects 145 152 153 [[Image(https://user-images.githubusercontent.com/25524928/130458502-313360a1-01dd-46f0-8ca7-e9cf0147ee6c.png)]] 154 146 155 == Student's Biography == 147 156 My name is Han WANG. I am a first year graduate student majoring in GIS at Peking University, and will get my Master's degree in 2023. And this is my github(https://github.com/HanwGeek) and my linkedin(https://www.linkedin.com/in/hanwgeek/). I am interested in all cool things. And it is very exciting to join the open source community! My research interest includes massive spatial temporal data management and analysis. Currently, I am working on a machine learning project based on big trajectory data, which is stored in PostgreSQL database and managed by PostGIS.