SqlGeometry and POINT EMPTY in WKB

Inspired by question Paul Ramsey asked today morning on IRC, I’ve inspected what kind of Well-Known-Binary output gives SqlGeometry for EMPTY geometries of all the seven geometry types as specified in OGC SFS. The SqlGeometry class is available from SQL Server System CLR Types for .NET Framework. Here we go.

I checked Well-Known-Binary output as returned by the SqlGeometry method STAsBinary(). Here is a small test program written in C#:

using System;
using System.Linq;
using Microsoft.SqlServer.Types;
namespace SqlGeometryEmpty
{
  class Test
  {
    static void Main(string[] args)
    {
      foreach (string type in
         Enum.GetNames(typeof(OpenGisGeometryType)))
      {
        string wkt = type.ToUpper() + " EMPTY";
        SqlGeometry geom = SqlGeometry.Parse(wkt);
        byte[] wkb = geom.STAsBinary().Buffer;
        string wkbhex = string.Join("",
          wkb.Select(
            b => b.ToString("X2")).ToArray());

        Console.WriteLine("{0}\n{1} ({2} bytes)\n",
          wkt, wkbhex, wkb.Length);
      }
    }
  }
}

The first observation is that WKB of EMPTY geometry for all types is returned as a a slightly different binary. All the binary forms are truncated to nine bytes. The first byte indicates endianness as expected. The second chunk of four bytes indicate geometry type. It is exactly as defined in OGC specifications. The third chunk of remaining four bytes are set to Zero and seem to play a role of size specifier: number of points in LINESTRING or number of rings in POLYGON, number of points in MULTIPOINT, and so on. This makes another observation that WKB for EMPTY is reported as a collection of primitive components.

The difference in binary of WKB of EMPTY geometry I mentioned is in that the actual type of input geometry is preserved, so there seems to be no implicit translation to geometry of some other type.

So far so good but not for too long. In fact, SqlGeometry implicitly casts POINT EMPTY to MULTIPOINT EMPTY geometry with the WKB of the following form (in hex):

010400000000000000

Here is complete output of the test program above:

POINT EMPTY
010400000000000000 (9 bytes)

LINESTRING EMPTY
010200000000000000 (9 bytes)

POLYGON EMPTY
010300000000000000 (9 bytes)

MULTIPOINT EMPTY
010400000000000000 (9 bytes)

MULTILINESTRING EMPTY
010500000000000000 (9 bytes)

MULTIPOLYGON EMPTY
010600000000000000 (9 bytes)

GEOMETRYCOLLECTION EMPTY
010700000000000000 (9 bytes)

A word about how PostGIS behaves. PostGIS reports GEOMETRYCOLLECTION EMPTY, regardless of actual type of input EMPTY geometry. It is in hex form:

010700000000000000

Generally, there is not many choices of how to report EMPTY geometry in clear and usable way and a form of collection with size equal to Zero seems to be the most appropriate choice. POINT EMPTY reported with type set to POINT (010100000000000000) would be ambiguous as feels like truncated or invalid form of POINT(0 0), especially in programming languages like C where native dynamic allocated arrays do not carry information about their size. IOW, geometry type is not enough information to process binary form of POINT EMPTY properly.

Reporting EMPTY geometries as a collection is a useful convention that seems to work well. PostGIS behaves about it in the very consistent manner reporting one type for all empties. SqlGeometry, so SQL Server, forces programmers to write a few more lines of code to handle all the possible cases. Yet another original exotic solution from Microsoft.

Consistent API is a bless!

Update: consistent specification of interface is even better.

6 thoughts on “SqlGeometry and POINT EMPTY in WKB

  1. As far as OGR is concerned, here’s the output of :

    from osgeo import ogr
    import binascii
    
    wktlist = [ 'POINT EMPTY',
                'LINESTRING EMPTY',
                'POLYGON EMPTY',
                'MULTIPOINT EMPTY',
                'MULTILINESTRING EMPTY',
                'MULTIPOLYGON EMPTY',
                'GEOMETRYCOLLECTION EMPTY' ]
    
    for wkt in wktlist:
        geom = ogr.CreateGeometryFromWkt(wkt)
        wkb = geom.ExportToWkb(ogr.wkbNDR)
        geom2 = ogr.CreateGeometryFromWkb(wkb)
        print wkt
        print '%s (%d bytes)' % (binascii.hexlify(wkb), len(wkb))
        print 'After importing wkb : %s' % geom2.ExportToWkt()
        print
    
    POINT EMPTY
    010100000000000000000000000000000000000000 (21 bytes)
    After importing wkb : POINT (0 0)
    
    LINESTRING EMPTY
    010200000000000000 (9 bytes)
    After importing wkb : LINESTRING EMPTY
    
    POLYGON EMPTY
    010300000000000000 (9 bytes)
    After importing wkb : POLYGON EMPTY
    
    MULTIPOINT EMPTY
    010400000000000000 (9 bytes)
    After importing wkb : MULTIPOINT EMPTY
    
    MULTILINESTRING EMPTY
    010500000000000000 (9 bytes)
    After importing wkb : MULTILINESTRING EMPTY
    
    MULTIPOLYGON EMPTY
    010600000000000000 (9 bytes)
    After importing wkb : MULTIPOLYGON EMPTY
    
    GEOMETRYCOLLECTION EMPTY
    010700000000000000 (9 bytes)
    After importing wkb : GEOMETRYCOLLECTION EMPTY

    So OGR behaves like SqlGeometry, except for the POINT EMPTY case where it fallbacks to exporting it as POINT (0, 0) in wkb. Truncating the wkb of POINT EMPTY to 9 bytes might make sense for the sake of consistency, but it would require an update of the OGC SF spec

  2. Actually, as you suggested, accepting to truncate to 9 bytes for POINT EMPTY would not work for code that imports wkb without knowing the length of the string (which is a bad practice from a security point of view as the code cannot check for malformed wkb)

    For example, OGRGeometry::importFromWkb( unsigned char *, int = -1 ) takes the length of the binary content as optionnal argument. If the size is specified (second argument >=0), it can check that no read outside of the buffer is attempted. Otherwise it supposes that the passed buffer is large enough.

    One could imagine to represent the wkb form of POINT EMPTY by setting one of the most significant bit of the 4 bytes used for the geometry type, as it is done to specify the Z or M form. Or add a new geometry type just meaning “POINT EMPTY”. Hum, another proof that emptyness is a non trivial concept ;-)

  3. Even, of course it would be a security issue if POINT EMPTY is truncated to 9 bytes, especially in languages as C or C++. And I have pointed out this problem. However, a consistent approach is needed and in my personal opinion PostGIS approach is the most reasonable. User learns that for every EMPTY geometry he can expect empty collection, so his code can become more generic, without checking a type, just checking size component and performing zero iteration over the bytes stream.

    Shortly, IMO, all EMPTY should evaluate to

    010700000000000000

    Back to OGR. It’s a bad idea to report POINT(0 0) for POINT EMPTY and it is actually invalid as these two geometries are not equivalent. The former is non-empty valid point of both coordinates of Zero, the latter is valid geometry with no coordinates initialized. I’m sure you get my point.

  4. Pingback: GIS-Lab Blog» ????? ????? » ??????? ?????? 40

  5. I don’t really agree that pushing all empties down into a geometry collection is so wise. Practically, it means we are making our data less expressive as we transit WKB. Typed empties exist, they are in the spec, and our functions can emit them, so we should express them. I just wish there was some way to express POINT EMPTY. Oh, and I wish the ISO SQL/MM spec didn’t contradict the implementation facts on the ground.

  6. Paul,

    I agree with you. The geometry collection trick is far from ideal. However, with I’m rather having in mind problems that may occur trying to parse truncated WKB of POINT EMPTY while having no flag indicating if it’s empty or not:

    010100000000000000

    I mean, this would be a serious issue for C/C++ and similar languages.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>