Version 3 (modified by 13 years ago) ( diff ) | ,
---|
MapGuide RFC 123 - Feature Join optimization shortcut using FDO Join APIs
This page contains a change request (RFC) for the MapGuide Open Source project. More MapGuide RFCs can be found on the RFCs page.
Status
RFC Template Version | (1.0) |
Submission Date | 24 October 2011 |
Last Modified | 24 October 2011 |
Author | Jackie Ng |
RFC Status | draft |
Implementation Status | pending |
Proposed Milestone | 2.4 |
Assigned PSC guide(s) | (when determined) |
Voting History | (vote date) |
+1 | |
+0 | |
-0 | |
-1 | |
no vote |
Overview
This RFC proposes to take advantage of recently introduced FDO join APIs to provide an optimization path for Feature Joins in MapGuide under certain Feature Source configurations.
Motivation
Feature Joins has been a traditionally underperforming and buggy aspect of MapGuide Open Source / MGE / AIMS. A look at the number of tickets in trac related to Feature Joins can attest to this fact.
Performance workarounds exist for SQL-based feature sources involving datastore-level joins encapulsated as a view (and possibly some metadata hacks in the datastore to be recognised as a feature class). This approach, while addressing the performance problem, presents its own set of issues, namely constraints imposed by the data store as a result of using a view.
As of FDO 3.6, new APIs were introduced allowing support for native joins at the data store level, which is much more efficent and better performing than the costly in-memory joins performed by the GWS Query Engine component. With some minor modifications in the feature query logic, MapGuide can take advantage of these new FDO Join APIs if the extended feature class being queried satisfies some conditions (outlined in the Proposed Solution).
By using FDO join APIs, we can tackle the performance and bugginess of Feature Joins simultaneously because both aspects are now delegated to the underlying FDO data store.
It is hoped with the implementation of this RFC that we end up with a more positive user story with regards to Feature Joins.
Proposed Solution
The solution consists of 3 parts:
- Testing for the FDO join optimization path when performing a SelectFeatures() call against an extended feature class
- Setting up the FDO join query.
- Handling the iteration logic for the results of a FDO join query.
Each part is explained in detail below.
Testing for the optimization
When performing a feature query against an extended feature class that consists of joins, use the FDO join APIs if the extended feature class satisfies the following criteria:
- The Feature Source this extended feature class belongs to supports joins (ie. The SupportJoins capability returns true)
- The Feature Source this extended feature class belongs to joins with another feature class from the same feature source (this is because FDO Joins work within the context of the same connection). The feature class being joined on cannot itself be an extended feature class.
- The extended feature class only contains one join. Supporting chained or multiple joins is beyond the scope of this RFC.
- The type of join being performed is supported by the underlying FDO provider.
If any of the above criteria is not met, the extended feature class in question is considered not to meet the FDO join requirements and is delegated off to the GWS Query Engine as it currently is.
As of writing this RFC, the following FDO providers support datastore-level joins:
- SQLite (3.7/Trunk)
- SQL Server Spatial (3.7/Trunk)
Feature Sources using the above providers that contain extended feature classes configured in the above fashion will stand to take advantage of this optimization path. Any other FDO providers that do implement the required FDO Join APIs in the future, feature sources using these providers will be automatically eligible for this optimization path if they meet the same criteria outlined above.
Setting up the FDO join query
We use the FdoIExtendedSelect interface to perform the FDO join query. The extended feature class already uses an optional prefix on the secondary class as a means of disambiguating identically named properties on both sides of the join. We will use this same prefix to alias the secondary property names to prevent ambiguous names during query preparation.
Properties from the primary class will be specified as the following FDO computed property:
primary.[PropertyName] AS [PropertyName]
primary will be specified as the alias for the FdoIExtendedSelect::SetAlias() method
Properties from the secondary class will be specified as the following FDO computed property:
secondary.[PropertyName] AS [Prefix][PropertyName]
secondary will be used as the alias for the join criteria that is added to the FdoIExtendedSelect's join criteria collection.
Because the Extended Class Definition in a feature source does not explicity specify the list of properties from the secondary class to include, we include all properties from any secondary class that we are joining on by default.
Through this setup, the returned feature reader will present the same property list as a reader returned by the GWS Query Engine.
The FDO join feature reader
The existing MgServerFeatureReader will be modified to incorporate iteration logic to handle forcing 1:1 in the resulting FDO feature reader.
When 1:1 cardinality is not being forced, this reader behaves like a normal feature reader.
In the case of the forcing of 1:1 cardinality, we take a different path and employ the following logic for ReadNext():
- Read the current identity property values. Hash these values into a string and check if this hashed string exists in an internal std::set.
- If this value exists, keep reading until we either get a hashed string that does not yet exist in the internal set, or until we reached the end of the feature reader, in which case we return false
- Store this hashed string into the internal set for future comparisons.
Iteration logic for the normal case (not 1:1), is simply to pass over to the underlying reader's ReadNext() method. Tracking of identity property values is not required for this case.
Performance-wise, support for FDO joins has already been implemented in mg-desktop and the benefits there are clear. The performance times for a 17000 feature by 16000 feature join (of this dataset) are outlined below.
Test Case | SDF | SQLite |
Inner Join | 744.1s | 5.4s |
Left Outer Join | 479.4s | 4.2s |
Inner Join (1:1) | 595.2s | 4.4s |
Left Outer Join (1:1) | 320.9s | 4.6s |
As can be seen from this table, FDO joins performed on average, 100 times faster than the traditional feature join counterpart! The SQLite provider implements the FDO join APIs. The SDF provider does not and is delegated to the GwsQueryEngine.
In the current implementation for the MapGuide Server, the SQLite numbers are even faster due to connection pooling and other caching mechanisms that are absent from the mg-desktop implementation.
Aggregates
Support for aggregate operations is beyond the scope of this RFC.
Implications
No public APIs are affected. This is a server-side modification to take advantage of new APIs introduced by an external component (FDO). At the WebTier level, processing FDO join results is still done through the existing MgProxyFeatureReader. No changes are required on the Web Tier.
No schema modifications are required for the Feature Source Schema. The extended class definition already provides enough information to construct an equivalent FDO join query.
Though not necessary, it would be nice for the join editors in Maestro and Infrastructure Studio to notify the user of such optimization availability if the edited feature source in question is configured correctly.
Test Plan
Add unit tests against some sample SQLite feature sources configured to take advantage of these optimizations. Verify that the FDO join optimization path is taken when selecting from these feature sources.
Funding / Resources
Community