Update on Nov 6th, 2007 at 3:23 pm: GeoJSON Driver Errata
During last couple of days I was trying to run away in fear of Halloween monsters, so I locked myself with laptop in my wardrobe and decided to spent this time on programming a new driver for OGR. I ended up with usable implementation of GeoJSON driver for OGR and I’d like to introduce it here and now.
Motivation was simple…to avoid zombies but not only. In the MOSS4G project, we need a way to talk to remote geospatial data services. from mobile devices. We’ve started development of WFS client but it’s not a trivial task and we don’t want to hurry with that. Also, WFS and parsing GML seems to be a heavy task for mobile solutions, but we will check it later :-). In the meantime, we have something light and fast – GeoJSON.
The GeoJSON is a new dialect based on JSON format. The JavaScript Object Notation (JSON) is a lightweight plain text format for data interchange and GeoJSON is nothing other than its specialization for geographic content. The GeoJSON format fits very well the same niches as GML, like geospatial data interchange over network. Currently, GeoJSON is supported as output format of services implemented by FeatureServer, GeoServer and CartoWeb.
It’s correct to assume that every valid GeoJSON encoded data is valid JSON content, so it’s possible to process GeoJSON data using any JSON parser you like. Internally, the OGR GeoJSON driver uses JSON-C library as a JSON parser. The JSON-C is a small and fast validating parser of JSON format implemented in portable C language by Michael Clark from Metaparadigm Pte. Ltd. The library is an Open Source Software and available under the terms of the MIT License.
The OGR GeoJSON driver is available in SVN repository of MOSS4G project in module trunk/libs/gdal/extensions/ogr. In near future, I’d like to add this driver to the official GDAL/OGR repository. For now, I’ve included patched GDAL makefiles so it should be feasible to build the driver using GDAL sources. The OGR GeoJSON driver is also available under the MIT License.
The OGR GeoJSON driver provides implementation of functions transforming GeoJSON encoded data to objects of OGR Simple Features model: Datasource, Layer, Feature, Geometry. The implementation is based on GeoJSON Specification draft, v5.0.
At the moment, the GeoJSON driver provides read-only access to all types of supported datasources (see below):
$ ogrinfo --formats Supported Formats: ... -> "GeoJSON" (readonly)
In near future, I’m going to add ability to insert and delete features, but it will be only available for FeatureServer connections.
Datasource
The OGR GeoJSON driver accepts three types of sources of data:
- Uniform Resource Locator (URL) – a Web address to perform HTTP request
- Plain text file with GeoJSON data
- Text passed directly and encoded in GeoJSON
I should indicate that the FeatureServer is the only Web Service provider of GeoJSON data I’ve used during tests.
Layer
Currently, there is no support of multiple layer datasource. It means, that if we open a datasource the driver will always produce single OGR layer. It isn’t clear how to categorize GeoJSON objects into separate layers and there is no concept like GetCapabilities request from OGC Web Services. Another assumption is that GeoJSON is medium and server independent format, what means GeoJSON data can be stored in a file, a table in database or read from a Web Service. The FeatureServer provides HTTP request to access list of available layers but it sends response encoded in JSON but not GeoJSON format. I tried to stick to GeoJSON Specification and avoid features specific to FeatureServer (or any other server). Anyway, if this subject will be solved in the GeoJSON Specification somehow, I will implement multi-layers support.
As there are no layer objects in the GeoJSON Specification, there are also no names of layers. Because the driver generates single OGRLayer object per datasource, I enforce the driver to use pre-defined name OGRGeoJSON, like this:
ogrinfo -ro http://featureserver/data/.geojson OGRGeoJSON
When a Web Service (ie. FeatureServer datasource) is used as datasource, each request will produce new layer. This behavior conforms to stateless nature of HTTP transaction. Just to give a simple analogy:
- Using an URL in a Web browser you can request and load single Web page. If you want another Web page, you need to use another URL.
- Using an URL to GeoJSON datasource you can generate single layer. If you want another layer, you need to use new URL.
I hope it makes sense and clarifies behavior of the OGR GeoJSON driver a little.
The GeoJSON Specification lists following types of objects that may occur in data: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, Feature, or FeatureCollection. The 3 types of the Multi* geometries are not supported at this moment, but it’s not a big deal, I just need to find 2-3 hours gap in my schedule :-).
If a top-level member of GeoJSON data is of any other type than FeatureCollection, the driver will produce a layer with only one feature. Otherwise, a layer will consists of a set of features.
Feature
The OGR GeoJSON driver maps each object of following types to new OGRFeature object: Point, LineString, Polygon, GeometryCollection, Feature.
According to the GeoJSON Specification, only the Feature must have a member with the name properties. Each member of properties is translated to OGR object of type of OGRField and added to corresponding OGRFeature object.
The GeoJSON Specification does not require all Feature objects in a collection must have the same schema of properties. If Feature objects in a set defined by FeatureCollection object have different schema of properties, then resulting schema of fields in OGRFeatureDefn is generated as union of all Feature properties.
For example, for given two features with different set of properties:
F1 = { a b c }
F2 = { b c d }
schema of resulting OGRLayer will include following attributes:
L1 = { a b c d e }
and attribute d for F1 is null and attribute a for feature F2 is also null.
I was also thinking about implementing support to request only attributes common for all features, an intersection, but I’ve put this concept away for a while. Intersecting attributes of big set of features might be a very slow process. Anyway, I’d like to ask for comments and ideas. Feel free to drop yours in the comments below.
It is possible to tell the driver to not to process attributes. Just set environment variable ATTRIBUTES_SKIP=YES. Default behavior is to preserve all attributes (an union), what is equal to setting ATTRIBUTES_SKIP=NO.
At the moment, it’s not possible to pass additional options to OGRSFDriver::Open() from the OGR API, so setting environment is just a simple workaround. This situation will probably change in future, because there is a plan to implement extension to the OGR API to handle extra options on open action. For details, check this document RFC 10: OGR Open Parameters.
Geometry
Similarly to the issue with mixed-properties features, the GeoJSON Specification draft does not require that all Feature objects in a collection must have geometry of the same type. Unfortunately, OGR objects model does not allow to have geometries of different types in single layer (see GeoJSON Driver Errata). To handle this limitation I decided to use OGRGeometryCollection as a meta geometry. By default, the OGR GeoJSON driver preserves type of geometries. To avoid problems with handling mixed-geometry in GeoJSON data, then you can set another environment variable, GEOMETRY_AS_COLLECTION=YES (default is NO) to tell the driver to wrap all geometries with OGRGeometryCollection type. Again, if you see a better solution, please don’t hesitate to share it with me.
Here is sample output from ogrinfo presenting this feature in action, with default
setting GEOMETRY_AS_COLLECTION=NO (default):
OGRFeature(OGRGeoJSON):0 name (String) = Pierwszy POINT (0 0) OGRFeature(OGRGeoJSON):1 name (String) = Drugi POINT (10 10)
and with GEOMETRY_AS_COLLECTION=YES:
OGRFeature(OGRGeoJSON):0 name (String) = Pierwszy GEOMETRYCOLLECTION (POINT (0 0)) OGRFeature(OGRGeoJSON):1 name (String) = Drugi GEOMETRYCOLLECTION (POINT (10 10))
Here are a few examples of GeoJSON driver in action:
- Reading GeoJSON from a file with raw Point object:
$ cat point.geojson
{
"type": "Point",
"coordinates": [100.0, 0.0]
}
$ ogrinfo -ro point.geojson
Layer name: OGRGeoJSON
Geometry: Point
Feature Count: 1
Extent: (100.000000, 0.000000) - (100.000000, 0.000000)
Layer SRS WKT:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]]
OGRFeature(OGRGeoJSON):0
POINT (100 0)
- Querying features from FeatureServer using attributes filter:
$ ogrinfo -ro http://featureserver/cities/.geojson OGRGeoJSON -where "name=Warsaw"
Layer name: OGRGeoJSON
Geometry: Point
Feature Count: 5
Extent: (10.000000, 10.000000) - (10.000000, 10.000000)
Layer SRS WKT:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.01745329251994328,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]]
name: String (0.0)
OGRFeature(OGRGeoJSON):1
name (String) = Warsaw
POINT (10 10)
- Translating FeatureServer datasource to ESRI Shapefile:
$ ogr2ogr -f "ESRI Shapefile" cities.shp http://featureserver/cities/.geojson OGRGeoJSON
The plan
- Add the OGR GeoJSON Driver to the official set of OGR drivers
- Implement Multi* geometries support
- Implement insert and delete operations in terms of FeatureServer
- Perhaps, add reading of FeatureServer layers manifest and then implement support of multi-layers per datasource, but I’m still not convinced to this concept. Have to talk to GeoJSON gurus and ask them if they have any plan for layering
- Develop small GeoJSON client for mobile devices
- Collect comments, new ideas and bug reports from you :-)
I’d like to thank Christopher Schmidt and Howard Butler for their help (yes, fortunately there was some WiFi signal in the closet, so I could chat with these great friends on the IRC ;-)) in understanding elements of GeoJSON and FeatureServer.
Hi Mateusz,
this is good news. Have you tried your geojson parser with the GeoServer output? It’s not always online, but when it is you can try hitting sigma.openplans.org
Here is a sample request (always include version=1.0.0, in 1.1 lon/lat
axis are swapped due to ogc requirements):
http://sigma.openplans.org:8080/geoserver/wfs?request=GetFeature&typeName=topp:states&version=1.0.0&outputformat=json
You can find a list of layers here:
http://sigma.openplans.org:8080/geoserver/mapPreview.do
(warning, some of them are various gigabytes, such as roads and major_roads, but if you stick to topp:states or the tasmania ones you
should have a decent coverage of the variout output types).
Let me know how it goes.
On the web, collections and URLs are one-to-one. I’d follow suit on the filesystem: one collection per file, and a directory of disk files would be your data source.
@ Sean Gillies
Thank! It’s a very good idea to handle a directory of files as GeoJSON data source. It would work similarly to OGR ESRI Shapefile driver. I was wondering about handling list of URL addresses stored in a file. So, the driver could read a bunch of URLs as multilple layers. However, it could solve some single-layer limitation but perhaps would be less intuitive for users.
@Andrea Aime
Today, we did some tests with Chris and Hobu against the GeoServer. It looks quite good, but unfortunately we were not able to test the driver with most of the data because the OGR GeoJSON does not support Multi* geometries yet. I will test it more extensively when we have Multies supported. It should happen soon, this week yet and I’ll drop you a note about results. Thanks!
@Andrea Aime
I forgot to share one observation related to the GeoServer tests.
Using your URL (the big one), I can access topp:states in GeoJSON format, but the OGR driver fails when trying to parse it. Features in this collection have property called PERSONS with float number as value and for first Feature “id”:”states.1″ its value is in E notation:
This is perfectly valid according to the JSON and RFC4627 (2.4. Numbers) but the OGR GeoJSON driver fails with parsing error.
I will try to fix it.
@Andrea Aime
I fixed (Ticket #1968) the parser and now I can access GeoServer using the URL you gave above with success. Certainly, we still don’t have the Multi* geometries supported, but I can read this data into OGR objects as a collection of features with NULL geometries assigned.
Hi Mateusz,
I tried it with http://www.opengeocoding.org but didn’t succeed (my bad I guess):
Is there a trick to get this working?
thanks
Markus
Markus,
I’ve replied by e-mail already but will repeate it here to keep the story archived for others too. The problem is that opengeocoding.org does not output valid GeoJSON document but a custom JSON structure. So, OGR GeoJSON driver is useless.
Mateusz
I am currently trying to figure out if this already is usable wth FDO OGR provider without muc succses. Can you please explain if and so how this should be defined in a connection string for the fdo provider?
tx
Luc
hi, mloskot
can you be kindly to add virtual files support for ogr GeoJSON driver, acutally, it’s very useful to handle GeoJSON without disc operation. sometime we just create GeoJSON in memory, no file needed.
http://www.gdal.org/cpl__vsi_8h.html#66e2e6f093fd42f8a941b962d4c8a19e
Hi mjollnir,
Unfortunately, I don’t have time to do it.
Have you read the GeoJSON driver manual? Are you aware you can pass GeoJSON text directly to the driver, without a GeoJSON file stored on disk?
As the manual says: The OGR GeoJSON driver accepts three types of sources of data and see the 3rd one.
It means that you can pass GeoJSON text directly to OGRGeoJSONDataSource::Open referenced in
const char* pszName. Also, see lines 96 – 99 of this file. It should work.Great, I use :
char *poJsonBuf = …
OGRDataSource *srcDS = OGRSFDriverRegistrar::Open(poJsonBuf);
It works now. Althought it’s a little memory waste since it duplicate the data in memory, compare with great virtual file. But it works now, thanks anyway.
mjollnir,
I agree that past-and-copy is not memory efficient. I believe it could be changed to reuse the buffer if ownership is transfered and if it’s well-documented.
Feel free to discuss it on the gdal-dev list or just submit ticket with patch.