Const-correctness schizophrenia in GDAL

March 4th, 2010

Const-correctness rants are quite common topic of chats on #gdal IRC channel. Some of the pearls I’ve got printed in to my mind:

A: The lesson is I ought to get things right the first time.
B: The issue with const method is that if you want to add lazy loading later, it can cause problems
C: GDAL is rather painful to use with const correct code, unfortunately :(
B: The solution is obvious: don’t write const correct code

Who’s right then, A or B?

I recall another motto from #gdal channel that sounds like “when unsure, do nothing” which has the following rationale:

especially when I realize afterwards that I’ve f**cked things because I neglected to follow the motto

Remembering these recommendations, it’s pretty clear why the const-mess in GDAL has happened. I’d conclude paraphrasing the motto this way:

I’ve f**cked things because I neglected to make a decision.

Now, poor GDAL beginner deadpickle, trying to find out (it’s me the evil) why compiler complains about his not-that-bad-written code, wandered to find and ask C gurus @ comp.lang.c and got the problem explained by Malcolm who wrote:

The problem is that, when C was first developed, there was no const keyword. So strings literal, which are constant, had to be non-const for backwards compatibility. This means that lots of programmers get lazy and omit the const, even from functions which don’t modify their string arguments. (There are also some subtle problems with const which means that this isn’t always a case of pure laziness). So a sort of solution is to discard the const qualifiers. However this is perpetuating the problem in your own code.

The motto stays in contradiction to a well-known best practice of const correct sooner than later. It’s way easier and cheaper to remove const-correctness once it turns out it does not express properly the actual design and contract than to apply it to existing codebase. Sometimes, the latter is even not possible making things f**cked up twice, in existing code base and in client’s code.

CMake interview for FLOSS Weekly at 4:30 EST

March 3rd, 2010

Bill Hoffman just notified on the CMake mailing list:

At 4:30, I am going to be interviewed for FLOSS Weekly.
The chat is here:
http://irc.twit.tv/
The video is here:
http://live.twit.tv/
Should be going on some time around 4:30 EST.

It’s on now.

UPDATE: FLOSS Weekly 111: CMake archived audio podcast

GIS-Lab joins Planet OSGeo

February 27th, 2010

OSGeo FoundationMaxim Dubinin syndicated GIS-Lab blog with the Planet OSGeo aggregator.

A few words about GIS-Lab from their website:

GIS-Lab – informal non-commercial community of GIS/RS specialists, we grow ourselves and help grow others.

GIS-Lab exists since April 2002 as an independent online resource specializing in geographic information systems (GIS) and remote sensing (RS). At present, the site is primarily oriented towards Russian-speaking GIS community, however, we do our best to translate as many materials as possible into English.

The GIS-Lab is the very first blog in Russian language syndicated with the Planet OSGeo, what makes the planet yet more international geo-caffee.

SqlGeometry and POINT EMPTY in WKB

February 26th, 2010

Inspired by question Paul Ramsey asked today morning on IRC, I’ve inspected what kind of Well-Known-Binary output gives SqlGeometry for EMPTY geometries of all the seven geometry types as specified in OGC SFS. The SqlGeometry class is available from SQL Server System CLR Types for .NET Framework. Here we go.

I checked Well-Known-Binary output as returned by the SqlGeometry method STAsBinary(). Here is a small test program written in C#:

using System;
using System.Linq;
using Microsoft.SqlServer.Types;
namespace SqlGeometryEmpty
{
  class Test
  {
    static void Main(string[] args)
    {
      foreach (string type in
         Enum.GetNames(typeof(OpenGisGeometryType)))
      {
        string wkt = type.ToUpper() + " EMPTY";
        SqlGeometry geom = SqlGeometry.Parse(wkt);
        byte[] wkb = geom.STAsBinary().Buffer;
        string wkbhex = string.Join("",
          wkb.Select(
            b => b.ToString("X2")).ToArray());

        Console.WriteLine("{0}\n{1} ({2} bytes)\n",
          wkt, wkbhex, wkb.Length);
      }
    }
  }
}

The first observation is that WKB of EMPTY geometry for all types is returned as a a slightly different binary. All the binary forms are truncated to nine bytes. The first byte indicates endianness as expected. The second chunk of four bytes indicate geometry type. It is exactly as defined in OGC specifications. The third chunk of remaining four bytes are set to Zero and seem to play a role of size specifier: number of points in LINESTRING or number of rings in POLYGON, number of points in MULTIPOINT, and so on. This makes another observation that WKB for EMPTY is reported as a collection of primitive components.

The difference in binary of WKB of EMPTY geometry I mentioned is in that the actual type of input geometry is preserved, so there seems to be no implicit translation to geometry of some other type.

So far so good but not for too long. In fact, SqlGeometry implicitly casts POINT EMPTY to MULTIPOINT EMPTY geometry with the WKB of the following form (in hex):

010400000000000000

Here is complete output of the test program above:

POINT EMPTY
010400000000000000 (9 bytes)

LINESTRING EMPTY
010200000000000000 (9 bytes)

POLYGON EMPTY
010300000000000000 (9 bytes)

MULTIPOINT EMPTY
010400000000000000 (9 bytes)

MULTILINESTRING EMPTY
010500000000000000 (9 bytes)

MULTIPOLYGON EMPTY
010600000000000000 (9 bytes)

GEOMETRYCOLLECTION EMPTY
010700000000000000 (9 bytes)

A word about how PostGIS behaves. PostGIS reports GEOMETRYCOLLECTION EMPTY, regardless of actual type of input EMPTY geometry. It is in hex form:

010700000000000000

Generally, there is not many choices of how to report EMPTY geometry in clear and usable way and a form of collection with size equal to Zero seems to be the most appropriate choice. POINT EMPTY reported with type set to POINT (010100000000000000) would be ambiguous as feels like truncated or invalid form of POINT(0 0), especially in programming languages like C where native dynamic allocated arrays do not carry information about their size. IOW, geometry type is not enough information to process binary form of POINT EMPTY properly.

Reporting EMPTY geometries as a collection is a useful convention that seems to work well. PostGIS behaves about it in the very consistent manner reporting one type for all empties. SqlGeometry, so SQL Server, forces programmers to write a few more lines of code to handle all the possible cases. Yet another original exotic solution from Microsoft.

Consistent API is a bless!

Update: consistent specification of interface is even better.

OSGeo Thai Chapter and OTB Team join the planet

February 23rd, 2010

OSGeo FoundationI’m pleased to announce two new blogs I have just syndicated with the Planet OSGeo aggregator.

It is:

  • Chaipat Nengcomma posts content that a part of Thai OSGeo local chapter about FOSS4G distribution in Thailand
  • OTB Team developing the Orffeo Toolbox – a library of image processing algorithms developed by CNES in the frame of the ORFEO Accompaniment Program.

Welcome to the Planet OSGeo!

parallel_sort problem fixed

February 21st, 2010

My problem with crashing programs using TBB has been solved. Alexey Kukanov replied to my question explaining that because I use TBB 2.1, thus I have to explicitly initialise the task scheduler. Without this initialization, no context (root) for tasks is created, so no tasks possible.

Simply, I was reading latest manual which was generated for TBB 2.2 (available in Ubuntu 10.04), so I missed this legacy requirement. In TBB 2.2 and later, the initialization is optional:

Using task_scheduler_init is optional in Intel? TBB 2.2. By default, Intel? TBB 2.2 automatically creates a task scheduler the first time that a thread uses task scheduling services and destroys it when the last such thread exits.

Correct version of the example program should look as follows:

#include <tbb/task_scheduler_init.h>
#include <tbb/parallel_sort.h>
#include <cmath>
#include <vector>
using namespace tbb;
int main()
{
    task_scheduler_init tbb_init; // automatic

    const int n = 100000;
    std::vector<double> a(n);
    for (int i = 0; i< n; i++)
    {
        a[i] = std::sin(double(i));
    }
    parallel_sort(a.begin(), a.end());
}

parallel_sort crashes on Ubuntu 9.10

February 20th, 2010

I’ve started to experiment with the Intel Threading Building Blocks and hit a wall trying to run a very simple example:


#include <tbb/parallel_sort.h>
#include <cmath>
#include <vector>
using namespace tbb;
int main()
{
    const int n = 100000;
    std::vector<double> a(n);
    for (int i = 0; i< n; i++)
    {
        a[i] = std::sin(double(i));
    }
    parallel_sort(a.begin(), a.end());
}
$ g++ -O0 -g -DTBB_USE_DEBUG  -o sort_vector sort_vector.cpp -ltbb
$ gdb ./sort_vector

(gdb) run
Starting program: /home/mloskot/workshop/tbb/parallel_sort/sort_vector
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
tbb::task_group_context::init (this=0x7ffffff9c4e0) at ../../src/tbb/task.cpp:3124
3124    ../../src/tbb/task.cpp: No such file or directory.
in ../../src/tbb/task.cpp
(gdb) bt
#0  tbb::task_group_context::init (this=0x7ffffff9c4e0) at ../../src/tbb/task.cpp:3124
#1  0x00000000004013ff in task_group_context (this=0x7ffffff9c4e0, relation_with_parent=tbb::task_group_context::bound)
at /usr/include/tbb/task.h:284
#2  0x0000000000401be4 in tbb::internal::parallel_quick_sort > (begin=0x7ffffff9c6a0,
end=0x7fffffffe120, comp=...) at /usr/include/tbb/parallel_sort.h:155
#3  0x0000000000401b23 in tbb::parallel_sort > (begin=0x7ffffff9c6a0, end=0x7fffffffe120,
comp=...) at /usr/include/tbb/parallel_sort.h:203
#4  0x0000000000401ab3 in tbb::parallel_sort (begin=0x7ffffff9c6a0, end=0x7fffffffe120)
at /usr/include/tbb/parallel_sort.h:219
#5  0x0000000000401363 in main () at sort_vector.cpp:12

It seems like a failure during initialization of worker threads pool or close to it.

I’m using fairly recent version of TBB 2.1 installed from Ubuntu 9.10 packages, but I’m suspicious this may be a problem with this particular binary version. Let’s see what Intel folks will judge parallel_sort example throws segmentation fault. Pity Microsoft PPL does not provide parallel_sort algorithm.

Update: see parallel_sort problem fixed