wiki:MapGuideRfc123

Version 6 (modified by jng, 13 years ago) ( diff )

--

MapGuide RFC 123 - Feature Join optimization shortcut using FDO Join APIs

This page contains a change request (RFC) for the MapGuide Open Source project. More MapGuide RFCs can be found on the RFCs page.

Status

RFC Template Version(1.0)
Submission Date24 October 2011
Last Modified24 October 2011
AuthorJackie Ng
RFC Statusdraft
Implementation Statuspending
Proposed Milestone2.4
Assigned PSC guide(s)(when determined)
Voting History(vote date)
+1
+0
-0
-1
no vote

Overview

This RFC proposes to take advantage of recently introduced FDO join APIs to provide an optimization path for Feature Joins in MapGuide under certain Feature Source configurations.

Motivation

Feature Joins has been a traditionally underperforming and buggy aspect of MapGuide Open Source / MGE / AIMS. A look at the number of tickets in trac related to Feature Joins can attest to this fact.

Performance workarounds exist for SQL-based feature sources involving datastore-level joins encapulsated as a view (and possibly some metadata hacks in the datastore to be recognised as a feature class). This approach, while addressing the performance problem, presents its own set of issues, namely constraints imposed by the data store as a result of using a view.

As of FDO 3.6, new APIs were introduced allowing support for native joins at the data store level, which is much more efficent and better performing than the costly in-memory joins performed by the GWS Query Engine component. With some minor modifications in the feature query logic, MapGuide can take advantage of these new FDO Join APIs if the extended feature class being queried satisfies some conditions (outlined in the Proposed Solution).

By using FDO join APIs, we can tackle the performance and bugginess of Feature Joins simultaneously because both aspects are now delegated to the underlying FDO data store.

It is hoped with the implementation of this RFC that we end up with a more positive user story with regards to Feature Joins.

Proposed Solution

The solution consists of 4 parts:

  • Testing for the FDO join optimization path when performing a SelectFeatures() call against an extended feature class
  • Setting up the FDO join query.
  • Handling the iteration logic for the results of a FDO join query.
  • Handling aggregates

Each part is explained in detail below.

Testing for the optimization

When performing a feature query against an extended feature class that consists of joins, use the FDO join APIs if the extended feature class satisfies the following criteria:

Regular selects

  • The Feature Source this extended feature class belongs to supports joins (ie. The SupportJoins capability returns true)
  • The Feature Source this extended feature class belongs to joins with another feature class from the same feature source (this is because FDO Joins work within the context of the same connection). The feature class being joined on cannot itself be an extended feature class.
  • The extended feature class only contains one join. Supporting chained or multiple joins is beyond the scope of this RFC.
  • If this extended feature class is being queried with a filter, the filter does not contain any references to properties on the feature class to be joined on. This is because there is no way I am aware of in the FDO Join API to apply filters to the non-primary sides of the join.
  • The type of join being performed is supported by the underlying FDO provider.

Select Aggregate

  • All the above conditions
  • The extended feature class is being queried cannot have 1:1 forced on it unless the aggregate function is SpatialExtents(), Min() or Max(). This is due to my current findings where an FDO select aggregate operation that contains an aggregate function can produce incorrect results because there is nothing in the FDO select aggregate API equivalent to the force 1:1 behavior. Including the identity properties of the primary class in the query with the distinct flag activated does not produce the intended results. The reason SpatialExtents(), Min() and Max() is exempt is because duplicated primary class features won't affect the final aggregate result with these functions, thus these aggregate functions are safe to use, whereas other aggregate functions like Sum() or Avg() would produce distorted results from the duplicated primary class features.

If any of the above criteria is not met, the extended feature class in question is considered not to meet the FDO join requirements and is delegated off to the GWS Query Engine as it currently is.

As of writing this RFC, the following FDO providers support datastore-level joins:

  • SQLite (3.7/Trunk)
  • SQL Server Spatial (3.7/Trunk)

Feature Sources using the above providers that contain extended feature classes configured in the above fashion will stand to take advantage of this optimization path. Any other FDO providers that do implement the required FDO Join APIs in the future, feature sources using these providers will be automatically eligible for this optimization path if they meet the same criteria outlined above.

Setting up the FDO join query

We use the FdoIExtendedSelect interface to perform the FDO join query. The extended feature class already uses an optional prefix on the secondary class as a means of disambiguating identically named properties on both sides of the join. We will use this same prefix to alias the secondary property names to prevent ambiguous names during query preparation.

Properties from the primary class will be specified as the following FDO computed property:

primary.[PropertyName] AS [PropertyName]

primary will be specified as the alias for the FdoIExtendedSelect::SetAlias() method

Properties from the secondary class will be specified as the following FDO computed property:

secondary.[PropertyName] AS [Prefix][PropertyName]

secondary will be used as the alias for the join criteria that is added to the FdoIExtendedSelect's join criteria collection.

Because the Extended Class Definition in a feature source does not explicity specify the list of properties from the secondary class to include, we include all properties from any secondary class that we are joining on by default.

Through this setup, the returned feature reader will present the same property list as a reader returned by the GWS Query Engine.

The FDO join feature reader

The existing MgServerFeatureReader will be modified to incorporate iteration logic to handle forcing 1:1 in the resulting FDO feature reader.

When 1:1 cardinality is not being forced, this reader behaves like a normal feature reader.

In the case of the forcing of 1:1 cardinality, we take a different path and employ the following logic for ReadNext():

  1. Read the current identity property values. Hash these values into a string and check if this hashed string exists in an internal std::set.
  2. If this value exists, keep reading until we either get a hashed string that does not yet exist in the internal set, or until we reached the end of the feature reader, in which case we return false
  3. Store this hashed string into the internal set for future comparisons.

Iteration logic for the normal case (not 1:1), is simply to pass over to the underlying reader's ReadNext() method. Tracking of identity property values is not required for this case.

Performance-wise, support for FDO joins has already been implemented in mg-desktop and the benefits there are clear. The performance times for a 17000 feature by 16000 feature join (of this dataset) are outlined below.

Test Case SDF SQLite
Inner Join 744.1s 5.4s
Left Outer Join 479.4s 4.2s
Inner Join (1:1) 595.2s 4.4s
Left Outer Join (1:1) 320.9s 4.6s

As can be seen from this table, FDO joins performed on average, 100 times faster than the traditional feature join counterpart! The SQLite provider implements the FDO join APIs. The SDF provider does not and is delegated to the GwsQueryEngine.

In the current implementation for the MapGuide Server, the SQLite numbers are even faster (in this particular case, up to a second faster) due to connection pooling and other caching mechanisms that are absent from the mg-desktop implementation.

Aggregates

If a SelectAggregates operation is invoked on an extended feature class, the same FDO join optimization condition checks are done against the extended feature class and its feature source.

In addition to normal select aggregate command processing, the same join criteria and join filter processing will be applied to the select aggregate command.

Implications

No public APIs are affected. This is a server-side modification to take advantage of new APIs introduced by an external component (FDO). At the WebTier level, processing FDO join results is still done through the existing MgProxyFeatureReader. No changes are required on the Web Tier.

No schema modifications are required for the Feature Source Schema. The extended class definition already provides enough information to construct an equivalent FDO join query.

Though not necessary, it would be nice for the join editors in Maestro and Infrastructure Studio to notify the user of such optimization availability if the edited feature source in question is configured correctly.

Test Plan

Add unit tests against some sample SQLite feature sources configured to take advantage of these optimizations. Verify that the FDO join optimization path is taken when selecting from these feature sources.

Funding / Resources

Community

Note: See TracWiki for help on using the wiki.