WKT Raster Cruncher

I’ve been playing with some nice amount of pixels using WKT Raster engine recently. Today, I tiled the japan.tif (gdalinfo output is here) using 3 different sizes of tile block. All the work was done using gdal2wktraster.py loader available from WKT Raster repository. Below, I included more interesting numbers which may be helpful for others who will decide to juggle rasters in PostGIS database.

Tiling schemes applied on japan.tif file:

  1. Using natural block size reported by GDAL gives 700 tiles, 14000 x 20 pixels each:
    gdal2wktraster.py -r japan.tif -t japan -o japan.sql -k
  2. Using block size 100 x 100 pixels gives 140 x 140 = 19600 tiles:
    gdal2wktraster.py -r japan.tif -t japan_100 -o japan_100.sql -k -m 100x100
  3. Using block size 200 x 200 pixels gives 70 x 70 = 4900 tiles:
    gdal2wktraster.py -r japan.tif -t japan_100 -o japan_200.sql -k -m 200x200

Sizes of all .sql text files produced are close to 1.2 GB. The timing for gdal2wktarster falls to around 40 minutes on Ubuntu 8.10 (i386) running under VirtualBox 2.with assigned single CPU core of Intel Xeon 3.2 GHz and 1024 MB RAM (host machine: 4 x Intel Xeon 3.2 GHz with 16 GB RAM). psql loads single table in about 6 minutes:

psql -d japan -f japan_100.sql

Interestingly, memory usage during processing WKT Raster by gdal2wktraster.py (Python run-time) seems not high (10-20%) and was on constant level of ~85 MB. However, 100% of CPU power was being eaten. It’s got me interested in comparing these results vector-brother of gdal2wktraster, to the shp2pgsql from PostGIS.

Detecting Visual C++ version in NMAKE makefile

Traditionally when building GDAL/OGR, GEOS or libLAS using NMAKE users specify version of Visual C++ compiler being used as value of MSVC_VER macro. This macros is not required but it’s recommended, so NMAKE can compose best possible set of compilation and linking flags for particular version of Visual C++. For instance, command specifying Visual C++ 8.0 (Visual Studio 2005) looks like this:

nmake -f makefile.vc MSVC_VER=1400

Recently, I hacked libLAS makefiles (and GEOS makefiles too), so Visual C++ version can be determined (semi-)automatically. NMAKE 1.62 or higher reports its version as value of _NMAKE_VER macro. The solution is to defined compiler version based on version of NMAKE tool:

Update 2009-04-03: Added _NMAKE_VER number 9.00.21022.08

!IF "$(_NMAKE_VER)" == ""
MSVC_VER = 4.0
!ERROR *** Prehistoric version of Visual C++
!ELSEIF "$(_NMAKE_VER)" == "162"
MSVC_VER = 5.0
!ERROR *** Prehistoric version of Visual C++
!ELSEIF "$(_NMAKE_VER)" == "6.00.8168.0"
MSVC_VER = 6.0
MSC_VER = 1200
!ERROR *** Prehistoric version of Visual C++
!ELSEIF "$(_NMAKE_VER)" == "7.00.9466"
MSVC_VER = 7.0
MSC_VER = 1300
!ELSEIF "$(_NMAKE_VER)" == "7.10.3077"
MSVC_VER = 7.1
MSC_VER = 1310
!ELSEIF "$(_NMAKE_VER)" == "8.00.50727.42"
MSVC_VER = 8.0
MSC_VER = 1400
!ELSEIF "$(_NMAKE_VER)" == "8.00.50727.762"
MSVC_VER = 8.0
MSC_VER = 1400
!ELSEIF "$(_NMAKE_VER)" == "9.00.21022.08"
MSVC_VER = 9.0
MSC_VER = 1500
!ELSEIF "$(_NMAKE_VER)" == "9.00.30729.01"
MSVC_VER = 9.0
MSC_VER = 1500
!ELSE
MSVC_VER = 0.0
MSC_VER = 0
!ENDIF

Later macros MSVC_VER and MSC_VER can be used to report Visual C++ version:

!IF "$(MSVC_VER)" == "0.0" && "$(MSC_VER)" == "0"
!MESSAGE *** Cannot determined Visual C++ version
!ERROR *** Aborting make job
!ELSE
!MESSAGE *** Using Microsoft NMAKE version $(_NMAKE_VER)
!MESSAGE *** Using Microsoft Visual C++ version $(MSVC_VER)
!MESSAGE *** Using Microsoft C/C++ version $(MSC_VER)
!ENDIF

This is not a rocket science, but it seems to work well and it frees users from manual specification of Visual C++ version. I’m not sure how the NMAKE version numbers vary across Visual Studio versions and builds. It would be good collect these version numbers to make the solution reliable. So, I’d be thankful if readers using Visual Studio 2002, 2003, 2005 and 2008 or newer :-) could report their NMAKE versions directly to me or post them as comments below.

Building PostGIS using Visual C++

I don’t like MinGW. I’ve been dreaming about building PostGIS using Visual C++ – the native development toolset for Windows platform – without being forced to install Unix-like environment inside Windows. Finally, I’ve got motivated enough and decided to make my dreams reality, so here is the story.

First, I collected all run-time dependencies required by PostGIS. I intentionally used the word run-time to indicate I’m not going to install MinGW neither Cygwin or any other GCC-for-Windows package. GNU make is also not required.

I use PostGIS source code from trunk in the Subversion repository.

Dependencies

There are only 3 software packages required:

  1. PostgreSQL 8.3.6official Win32 binaries and source code
  2. GEOS – build from sources, it’s safe to grab SVN trunk
  3. PROJ.4 – also build from sources and also from SVN trunk

I installed the PostgreSQL 8.3.6 without PostGIS extensions, as I’m going to provide my own :-), using default location c:\Program Files\PostgreSQL\8.3. The source code of PostgreSQL 8.3.6 went to d:\dev\postgresql\postgresql-8.3.6.

Short note on directories layout using for projects downloaded directly from repositories. Root path of source tree of each project have the same layout: D:\dev\PROJECT\_svn\trunk. For Visual Studio projects, these paths are defined as macros in postgis.vsprops (Visual C++ Property Sheet), so it should be easy to redefine them without any need to hack other project settings like Additional Include Directories and others.

Continue reading

Run WKT Raster, run!

I have a not-so-small GeoTIFF raster dataset. Here is what GDAL has to say about it:

$ gdalinfo japan.tif
Driver: GTiff/GeoTIFF
Files: japan.tif
Size is 14000, 14000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.2572235630016,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (137.386330630776030,38.325757833006122)
Pixel Size = (0.000256020461176,-0.000256020461176)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DOCUMENTNAME=ER Mapper 6.4
  TIFFTAG_IMAGEDESCRIPTION=ER Mapper GeoTiff raster translator V1.0: Band 1 = Red, Band 2 = Green, Band 3 = Blue
  TIFFTAG_SOFTWARE=ER Mapper 6.4
  TIFFTAG_XRESOLUTION=0
  TIFFTAG_YRESOLUTION=0
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( 137.3863306,  38.3257578) (137d23'10.79"E, 38d19'32.73"N)
Lower Left  ( 137.3863306,  34.7414714) (137d23'10.79"E, 34d44'29.30"N)
Upper Right ( 140.9706171,  38.3257578) (140d58'14.22"E, 38d19'32.73"N)
Lower Right ( 140.9706171,  34.7414714) (140d58'14.22"E, 34d44'29.30"N)
Center      ( 139.1784739,  36.5336146) (139d10'42.51"E, 36d32'1.01"N)
Band 1 Block=14000x20 Type=Byte, ColorInterp=Red
Band 2 Block=14000x20 Type=Byte, ColorInterp=Green
Band 3 Block=14000x20 Type=Byte, ColorInterp=Blue

I loaded it to PostGIS/WKT Raster table with regular blocking enabled in PostgreSQL 8.3 database running on Ubuntu 8.10 (32-bit) installed as a guest system under VirtualBox:

$ gdal2wktraster.py -r japan.tif -t japan_rb -o japan_rb.sql -k
$ psql -d test -f japan_rb.sql

The loader generated output file japan_rb.sql of size of 1.1 GB and it makes 700 records (raster tiles or blocks) in the database. The disk usage reported for the raster table is:

sistest=# SELECT relfilenode, relpages FROM pg_class WHERE relname = 'japan_rb';
 relfilenode | relpages
-------------+----------
       71700 |        5
(1 row)

I run simple test query:

$ psql -d test
test=# SET log_statement_stats TO 1;
SET
test=# SELECT rid FROM japan_rb WHERE RT_Width(rast) != 14000 OR RT_Height(rast) != 20;
 rid
-----
(0 rows)

In the PostgreSQL log I got dumped a bunch of interesting statistics:

2009-03-27 15:23:38 GMT LOG:  QUERY STATISTICS
2009-03-27 15:23:38 GMT DETAIL:  ! system usage stats:
 ! 5.084838 elapsed 2.544159 user 2.468154 system sec
 ! [4.960310 user 5.204325 sys total]
 ! 0/0 [16/28160] filesystem blocks in/out
 ! 0/320190 [0/647479] page faults/reclaims, 0 [0] swaps
 ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
 ! 0/155 [9/327] voluntary/involuntary context switches
 ! buffer usage stats:
 ! Shared blocks:      74192 read,          0 written, buffer hit rate = 51.71%
 ! Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
 ! Direct blocks:          0 read,          0 written

The virtual machine has assigned 1024 MB of RAM and one CPU Intel Xeon 3.2 GHz.

You are strongly welcome to share your comments?

PostGIS SVN moved

Paul just announced that Subversion repository of PostGIS source code has moved to OSGeo server. The new URL of SVN is: http://svn.osgeo.org/postgis/.

Only base location of the repository has changed, so the cheapest way to switch existing working copy that points to old location of SVN is to switch & relocate:

mloskot@dog:~/dev/postgis/_svn/trunk$ svn switch --relocate \
   http://svn.refractions.net/postgis/trunk/  \

https://svn.osgeo.org/postgis/trunk

For the sake of security, I would recommend committers to use https scheme which is well supported on OSGeo server.

Turing award goes to Barbara Liskov

Learning principles of Object-Oriented Programming, one of the first and very important thing to understand is a definition of subtype. It’s usually not a big problem to explain it correctly and there are a few descriptions dangling around.

Of course, I have my favourite definition of the relation between supertype and subtype. It is called Liskov Substitution Principle. The LSP reveals existing subtleties that may make understanding of the term of subtype not easy. There is a well-known example presenting potential problems: a squere is a rectangle or may be it is not? Robert Martin has written more about in C++ Report long time ago (PDF). For a C++ programmer, like myself – who cares about design by contract (DbC) – the Liskov Substitution Principle is helpful to understand role of pre-/post-conditions in inheritance.

The Liskov Substitution Principle was formulated by Professor Barbara Liskov (MIT). Two days ago, BBC announced as follows:

The 2009 A. M. Turing Award has gone to Barbara Liskov for her contributions to programming.

(Barbara Liskov wins Turing award, BBC)

Congratulations!

Lambda, I love you!

I’m writing a small driver for readint WKT Raster data from RASTER column in PostGIS-enabled database and I want to report name of database I’m connected with. My reader eats connection string, and here I’ve fallen in love with Python lambda.

Given connstr stores connection string to PostgreSQL database in format well-known from libpq, single-line anonymous function can do the whole job:

filter(lambda db: db[:6] == 'dbname', connstr.split())[0].split('=')[1]

Complete example:

$ python
Python 2.5.2 (r252:60911, Oct  5 2008, 19:24:49)
>>> connstr = "dbname='rtest' host='localhost' user='mloskot'"
>>> filter(lambda db: db[:6] == 'dbname', connstr.split())[0].split('=')[1]
"'rtest'"
>>> connstr = "password='xxx' port=5432 dbname='rtest' host='localhost' user='mloskot'"
>>> filter(lambda db: db[:6] == 'dbname', connstr.split())[0].split('=')[1]
"'rtest'"
>>>

Have fun!