Opened 17 years ago

Last modified 5 years ago

#462 reopened defect

GDAL FDO provider stability issues (patch)

Reported by: zspitzer Owned by: brucedechant
Priority: medium Milestone: 4.0
Component: Rendering Service Version:
Severity: blocker Keywords:
Cc: stevedang, brucedechant, haris External ID:

Description

I have just had a 2.0 RC4 server crash and was forced to reboot the entire server running with a lot of other applications.

I was playing with a tif via gdal and the server crashed with the following stack trace. the server remained up after afterwards, but the service would not respond to a stop request and was left hung, nothing is written to the error log afterward

I have seen similiar behaviour with 1.2, but i wasn't able to reproduce the error reliably

<2008-02-26T17:59:07> Administrator

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png StackTrace:

  • MgTileServiceHandler.ProcessOperation line 83 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\TileServiceHandler.cpp
  • MgOpGetTile.Execute line 150 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\OpGetTile.cpp
  • MgServerTileService.GetTile line 263 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\tile\ServerTileService.cpp
  • MgByteSink::ToFile line 245 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink.cpp
  • MgByteSink.ToFile line 220 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\common\foundation\Data/ByteSink.cpp A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache\a00059a0-ffff-ffff-8000-00188b8b4aed_en_7F0000010B060B050B04_MapDefinition3/S3/Base Layer Group/R0/C0/3_0.png

<2008-02-26T17:59:09> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace:

  • MgMappingUtil.StylizeLayers line 781 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\mapping\MappingUtil.cpp Failed to stylize layer: True Marble

An unclassified exception occurred. <2008-02-26T17:59:10> Administrator

Error: Failed to stylize layer: True Marble

An unclassified exception occurred.

StackTrace:

  • MgMappingUtil.StylizeLayers line 781 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\mapping\MappingUtil.cpp Failed to stylize layer: True Marble

An unclassified exception occurred.

Attachments (4)

mapguide_raster_unalloc.5.patch (1.6 KB ) - added by jbirch 16 years ago.
mapguide_raster_stability.patch (976 bytes ) - added by jbirch 16 years ago.
grfp_addref.patch (6.1 KB ) - added by traianstanev 16 years ago.
GDAL raster provider patch to addref the FDO connection from the FeatureReader
gdal_nomutex_cacheschema.patch (12.1 KB ) - added by traianstanev 16 years ago.
Fix refcounting problem + cache schema

Download all attachments as: .zip

Change History (34)

comment:1 by tomfukushima, 17 years ago

Please try editing your serverconfig.ini file and change the line that reads DataConnectionPoolSizeCustom = to DataConnectionPoolSizeCustom = OSGeo.Gdal:1

and report this resolves your problem. Thanks, Tom

comment:2 by tomfukushima, 17 years ago

Wow, what ugly formatting, let's try again. Change the line that reads

DataConnectionPoolSizeCustom =

to

DataConnectionPoolSizeCustom = OSGeo.Gdal:1

comment:3 by zspitzer, 17 years ago

Just tried that, seems better, but I'm still getting this error, after which most layers (sdf, not raster) don't render properly until i restart the service

The service hasn't locked up again since the change

<2008-02-27T13:07:09> Administrator

Error: Failed to stylize layer: LayerDefinition35

An unclassified exception occurred.

StackTrace:

  • MgMappingUtil.StylizeLayers line 781 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_22.4\mgdev\server\src\services\mapping\MappingUtil.cpp Failed to stylize layer: LayerDefinition35

An unclassified exception occurred.

comment:4 by zspitzer, 17 years ago

And then i am seeing

Cannot create any more connections to the OSGeo.Gdal FDO provider.

comment:5 by tomfukushima, 17 years ago

The connection problem has been fixed by submission r2978 (submitted after RC4 came out)

comment:6 by tomfukushima, 17 years ago

Resolution: fixed
Status: newclosed

comment:7 by zspitzer, 17 years ago

Resolution: fixed
Status: closedreopened

Still occurring with the final 2.0.0 release version

<2008-03-04T02:09:12> Anonymous

Error: A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png StackTrace:

  • MgTileServiceHandler.ProcessOperation line 83 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\TileServiceHandler.cpp
  • MgOpGetTile.Execute line 150 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\OpGetTile.cpp
  • MgServerTileService.GetTile line 263 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\tile\ServerTileService.cpp
  • MgByteSink::ToFile line 245 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\common\foundation\Data/ByteSink.cpp
  • MgByteSink.ToFile line 220 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\common\foundation\Data/ByteSink.cpp A file IO exception occurred: C:\Program Files\MapGuideOpenSource2.0\Server\Repositories\TileCache\RASTER_True Marble_True Marble Tiled EPSG 4283/S3/True Martble/R120/C150/26_12.png

<2008-03-04T09:16:13> Anonymous

Error: An unclassified exception occurred. StackTrace:

<2008-03-04T09:16:14> Anonymous

Error: Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

StackTrace:

  • MgMappingUtil.StylizeLayers line 776 file d:\buildforgeprojects\mapguide_open_source_v2.0\build_23.8\mgdev\server\src\services\mapping\MappingUtil.cpp Failed to stylize layer: True Marble Aust 250m epsg 4283 B

Cannot create any more connections to the OSGeo.Gdal FDO provider.

comment:8 by zspitzer, 17 years ago

Summary: Server Crash and server unresponsiveServer Crash and Unresponsive using OSGeo.Gdal FDO provider

comment:9 by zspitzer, 16 years ago

Version: 2.0.02.0.2

Problem still occurs with 2.0.2 release & tiled maps

comment:10 by tomfukushima, 16 years ago

Cc: stevedang added

We did stability tests with the GDAL provider and GeoTIFF and found that it was stable. Can you try your test again with GeoTIFF (you can use the GDAL utilities to convert to this format) and see if the problem occurs again? We know of a memory leak, but it's only a few KB at a time so it should take a while before that causes memory problems.

comment:11 by zspitzer, 16 years ago

test case is here which will crash 2.0.2

you will need to tweak the path for the raster file which is unmanaged and is defined as being under c:\data\raster\

http://ennoble.dreamhosters.com/tests//raster_ticket_462.mgp.zip

comment:12 by tomfukushima, 16 years ago

Hi Zac, please add the steps to recreate the problem once the package is loaded.

comment:13 by zspitzer, 16 years ago

just open up the basic layout and start to zoom in and pan around,

it doesn't take long for mapguide to stop rendering tiles

http://localhost:8008/mapguide/mapviewerajax/?WEBLAYOUT=Library%3a%2f%2fraster%2fTrueMarble.8km.5400x2700+basic.WebLayout&LOCALE=en&USERNAME=Anonymous&PASSWORD=&

comment:14 by tomfukushima, 16 years ago

Thanks Zac, I couldn't reproduce this using IE, but once I moved to using Google Chrome, a problem showed up right away. I only needed to do a single zoom and then pan. Steve, this is pretty easy to reproduce. Come see me if you want to see it. The server continues to work though, as I can continue to get in through Studio; it seems that the GDAL provider is hooped because I can't do anything with raster.

This is the error that I get:

<2008-11-25T17:34:24> 	Ajax Viewer	144.111.170.90	Anonymous
 Error: Failed to stylize layer: TrueMarble.8km.5400x2700 layer
        An unclassified exception occurred.
 StackTrace:
  - MgMappingUtil.StylizeLayers line 786 file d:\build\mapguide_open_source_v2.0\build_30.11\mgdev\server\src\services\mapping\MappingUtil.cpp	Failed to stylize layer: TrueMarble.8km.5400x2700 layer
An unclassified exception occurred.

by jbirch, 16 years ago

by jbirch, 16 years ago

comment:15 by jbirch, 16 years ago

There are (at least) three problems with raster in 2.0.2.

The first is a memory leak that has since been fixed in 2.1.

The second is (my non-technical description) writing to unallocated memory. (see mapguide_raster_unalloc.5.patch)

The third is a defect in the way that MapGuide deals with single-threaded providers. The attachment mapguide_raster_stability.patch provides a workaround for this defect in conjunction with the GDAL provider.

However, this could conceptually happen with other providers. Haris is looking into the problem more in depth, but in the meantime explained the problem to me as follows, referencing the code around Line 660 of MappingUtil.cpp :

Assume two Raster layers accessed at same time.

  1. Raster connection to Layer 1 created
  2. ExecuteRasterQuery executed, class Georaster created which keeps pointer to connection ( not adding ref count)
  3. That threads goes into Stylize Layers
  4. Second thread goes into ExecuteRasterQuery, but is accessing another raster layer so can't use the same connection
  5. Second thread creates new connection to raster provider, but because the pool size for single-threaded providers is limited to 1 (and also because gdal provider didn't ref count++) the connection manager deletes the first thread's connection
  6. First thread which now in StylizeGridLayer finds that its connection was deleted and the pointer is gone

Result: Exception and corrupted memory

comment:16 by brucedechant, 16 years ago

Cc: brucedechant added

comment:17 by brucedechant, 16 years ago

If the FdO connection manager is deleting a connection in use that is a bug. Let me debug the server to see what is happening because we only see this issue with GDAL. I would like to know where the bug is - either in the handling of the single threaded providers or if it is just GDAL not referencing counting properly.

comment:18 by jbirch, 16 years ago

Cc: haris added

by traianstanev, 16 years ago

Attachment: grfp_addref.patch added

GDAL raster provider patch to addref the FDO connection from the FeatureReader

by traianstanev, 16 years ago

Fix refcounting problem + cache schema

comment:19 by jbirch, 16 years ago

Milestone: 2.02.1

comment:20 by brucedechant, 16 years ago

Owner: set to brucedechant
Status: reopenednew

comment:21 by brucedechant, 16 years ago

Status: newassigned

comment:22 by brucedechant, 16 years ago

Resolution: fixed
Status: assignedclosed

Fixed.

See changeset r3829.

comment:23 by zspitzer, 13 years ago

Component: GeneralRendering Service
Milestone: 2.12.4
Resolution: fixed
Status: closedreopened
Summary: Server Crash and Unresponsive using OSGeo.Gdal FDO providerGDAL FDO provider stability issues (patch)
Version: 2.0.2

re-opening as there are useful patches which haven't been applied yet

comment:25 by brucedechant, 13 years ago

Please list and attach the new patches that need to be applied.

comment:26 by brucedechant, 13 years ago

Specifically the MapGuide source patches. FDO source patches should be added to the linked FDO trac ticket directly.

comment:27 by jng, 12 years ago

Milestone: 2.42.5

comment:28 by jng, 12 years ago

Milestone: 2.52.6

comment:29 by jng, 7 years ago

Milestone: 3.03.3

Ticket retargeted after milestone closed

comment:30 by jng, 5 years ago

Milestone: 3.34.0

Milestone renamed

Note: See TracTickets for help on using tickets.