Friday, February 3, 2012

ArcGIS on a High Performance cluster: Part 1, Linux

Now that we have our new community cluster running at UD  it's time to learn how I can optimize GIS software for that environment.

The first hurdle is operating system

ArcInfo Workstation used to be a great way to run Esri/ArcGIS geoprocessing tasks on *nix boxes.  However, it seems the last version of ArcInfo Workstation that ran on Linux was 9.1.  I'd contacted Esri about obtaining a copy of 9.1, but apparently it is out of production and they do not have any copies of the software that they'd be able to send to me.

So basically the workstation/desktop route is out not available at this time.  But there is more than one way to skin a cat: ArcGIS Server 10 (AGS) runs on Linux.

I've recently seen documentation which suggests geoprocessing models can be leveraged by publishing them as geoprocessing services through ArcGIS Server.  I have heard at the Esri conference, and in some documentation that success has even been reported in distributed tasks, such as building caches, through the SOM/SOC architecture that is available out of the box in AGS.  Could this architecture be extended to distribute geoprocessing tasks?

Taking a different tack, AGS exposes the geoprocessing object through a Python wrappers.  That means that we should be able to programatically run our software on Linux through Python.  Python also has wrappers or libraries for multithreading and MPI (distributed), so the implications for taking advantage of our cluster are especially exciting. 

Note: There are some differences to be expected, such as with file path conventions and name lengths. 

Next: Multithreading

No comments: