Crunching overviews

Continuing my tale about loading big raster datasets into PostGIS database with WKT Raster extension, I’d like to post an update about experience with processing overviews.

For testing purposes, I built excessive number of overviews for japan.tif dataset using gdaladdo utility:

$ gdaladdo -r average japan.tif 2 4 8 16 32 64 128

The command above produced 7 overviews with the following dimensions:

$ gdalinfo japan_2_4_8_16_32_128.tif | grep -m 1 Ov
Overviews: 7000x7000, 3500x3500, 1750x1750, 875x875, 438x438, 219x219, 110x110

Continue reading

WKT Raster Cruncher

I’ve been playing with some nice amount of pixels using WKT Raster engine recently. Today, I tiled the japan.tif (gdalinfo output is here) using 3 different sizes of tile block. All the work was done using gdal2wktraster.py loader available from WKT Raster repository. Below, I included more interesting numbers which may be helpful for others who will decide to juggle rasters in PostGIS database.

Tiling schemes applied on japan.tif file:

  1. Using natural block size reported by GDAL gives 700 tiles, 14000 x 20 pixels each:
    gdal2wktraster.py -r japan.tif -t japan -o japan.sql -k
  2. Using block size 100 x 100 pixels gives 140 x 140 = 19600 tiles:
    gdal2wktraster.py -r japan.tif -t japan_100 -o japan_100.sql -k -m 100x100
  3. Using block size 200 x 200 pixels gives 70 x 70 = 4900 tiles:
    gdal2wktraster.py -r japan.tif -t japan_100 -o japan_200.sql -k -m 200x200

Sizes of all .sql text files produced are close to 1.2 GB. The timing for gdal2wktarster falls to around 40 minutes on Ubuntu 8.10 (i386) running under VirtualBox 2.with assigned single CPU core of Intel Xeon 3.2 GHz and 1024 MB RAM (host machine: 4 x Intel Xeon 3.2 GHz with 16 GB RAM). psql loads single table in about 6 minutes:

psql -d japan -f japan_100.sql

Interestingly, memory usage during processing WKT Raster by gdal2wktraster.py (Python run-time) seems not high (10-20%) and was on constant level of ~85 MB. However, 100% of CPU power was being eaten. It’s got me interested in comparing these results vector-brother of gdal2wktraster, to the shp2pgsql from PostGIS.

Run WKT Raster, run!

I have a not-so-small GeoTIFF raster dataset. Here is what GDAL has to say about it:

$ gdalinfo japan.tif
Driver: GTiff/GeoTIFF
Files: japan.tif
Size is 14000, 14000
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.2572235630016,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (137.386330630776030,38.325757833006122)
Pixel Size = (0.000256020461176,-0.000256020461176)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_DOCUMENTNAME=ER Mapper 6.4
  TIFFTAG_IMAGEDESCRIPTION=ER Mapper GeoTiff raster translator V1.0: Band 1 = Red, Band 2 = Green, Band 3 = Blue
  TIFFTAG_SOFTWARE=ER Mapper 6.4
  TIFFTAG_XRESOLUTION=0
  TIFFTAG_YRESOLUTION=0
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  ( 137.3863306,  38.3257578) (137d23'10.79"E, 38d19'32.73"N)
Lower Left  ( 137.3863306,  34.7414714) (137d23'10.79"E, 34d44'29.30"N)
Upper Right ( 140.9706171,  38.3257578) (140d58'14.22"E, 38d19'32.73"N)
Lower Right ( 140.9706171,  34.7414714) (140d58'14.22"E, 34d44'29.30"N)
Center      ( 139.1784739,  36.5336146) (139d10'42.51"E, 36d32'1.01"N)
Band 1 Block=14000x20 Type=Byte, ColorInterp=Red
Band 2 Block=14000x20 Type=Byte, ColorInterp=Green
Band 3 Block=14000x20 Type=Byte, ColorInterp=Blue

I loaded it to PostGIS/WKT Raster table with regular blocking enabled in PostgreSQL 8.3 database running on Ubuntu 8.10 (32-bit) installed as a guest system under VirtualBox:

$ gdal2wktraster.py -r japan.tif -t japan_rb -o japan_rb.sql -k
$ psql -d test -f japan_rb.sql

The loader generated output file japan_rb.sql of size of 1.1 GB and it makes 700 records (raster tiles or blocks) in the database. The disk usage reported for the raster table is:

sistest=# SELECT relfilenode, relpages FROM pg_class WHERE relname = 'japan_rb';
 relfilenode | relpages
-------------+----------
       71700 |        5
(1 row)

I run simple test query:

$ psql -d test
test=# SET log_statement_stats TO 1;
SET
test=# SELECT rid FROM japan_rb WHERE RT_Width(rast) != 14000 OR RT_Height(rast) != 20;
 rid
-----
(0 rows)

In the PostgreSQL log I got dumped a bunch of interesting statistics:

2009-03-27 15:23:38 GMT LOG:  QUERY STATISTICS
2009-03-27 15:23:38 GMT DETAIL:  ! system usage stats:
 ! 5.084838 elapsed 2.544159 user 2.468154 system sec
 ! [4.960310 user 5.204325 sys total]
 ! 0/0 [16/28160] filesystem blocks in/out
 ! 0/320190 [0/647479] page faults/reclaims, 0 [0] swaps
 ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
 ! 0/155 [9/327] voluntary/involuntary context switches
 ! buffer usage stats:
 ! Shared blocks:      74192 read,          0 written, buffer hit rate = 51.71%
 ! Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
 ! Direct blocks:          0 read,          0 written

The virtual machine has assigned 1024 MB of RAM and one CPU Intel Xeon 3.2 GHz.

You are strongly welcome to share your comments?