Tuesday, February 28, 2012

ArcGIS Online CSV Hosting: It's Not All There

ArcGIS Online allows a user to upload CSVs, but as we found out, this feature provides less extensive sharing capabilities than other comparable cloud hosting services.

The documentation describes CSV hosting.  It is possible to upload the file for others in a group to download, but it seems that the ability to share the CSV, a link or as something that could be added to multiple maps, is only available when a premium subscription is activated.

In lieu of a premium subscription, an excellent option is to host the CSV with Google Docs and share the URL with relevant parties.  They can then add the CSV to their invididual ArcGIS Online maps, as this blog post shows.  The downside with this approach is that it does not integrate with your group as set up on ArcGIS Online.  You will also need to publicly share the CSV from Google Docs, since that is the only way you can expose a URL (without some kind of additional programming).  Thus, the downside to this approach is for mostly for organizations that have particular data sharing requirements.

Tuesday, February 21, 2012

Operationalizing SDE Subsetting/Layer Registration

Q: How to get max latitude for each given longitude in a time period subset of points as a GIS layer?

The particular dataset we were working with already had X and Y coordinate fields and was in shapefile format.  After attempting to use ArcGIS tools, including those that exposed the limited SQL that ArcGIS offers without SDE, we found that SDE was necessary.  We had already set up a SDE Postgis/Postgres (PG) database, so we could easily import the shapefile. 

First we wanted to create a subset of max latitudes in PG, through the preferred PG interface (ours was pgAdmin III).  Where max is the max subset and data is the original data:
SELECT * INTO sde.max FROM sde.data Where y in (SELECT max(y) FROM sde.data GROUP BY x); ALTER TABLE sde.max OWNER TO sde; GRANT ALL ON TABLE sde.max TO sde;
.  Notice the use of the 'sde.' schema prefix and the permissions queries, which we found necessary (see stumbling blocks below).  We operationalized this by semi-colon delimiting and changing table names where necessary.  This could be made much more efficient by getting a list of tables with the name format given to the tables that we wished to subset, but we were more interested in getting a result than more completely operationalizing at this time.

Next we needed to register these new tables as SDE layers, so that ArcGIS could "see" them.  After this step, the tables behave as normal ArcGIS feature classes.  Before this set, we could not even get properties of the tables, let alone interact with them (an error message would be displayed).  We ran the following command to do this through Windows cmd (SDE bin must be on the Path):
sdelayer -o register -l max,shape -e npc -C objectid,sde -i sde:postgresql:localhost -D sde -s localhost -u sde -p sde -t st_geometry
  .  We operationalized this by repeating this in a .bat file and changing tables names.  Again, this could be made more efficient by using an sde command to get a table list and looping through all tables that had names which matched the max subset table name format.

Stumbling Blocks
  1. The most difficult problem we ran into was the "DBMS table not found (-37)" error, which apparently is quite common for a variety of reasons with SDE.  This ultimate came down to needing to make sure the table was stored under the sde database, in the sde schema, and that the table was owned by sde.  I had assumed that making the owner of the database sde would cause new tables to be sde owned by default, but this was not the case.
  2. You may need to update your SDE/ArcGIS Desktop to 10 SP3.  This was mentioned in some forums with the above error and with other errors around not being able to see tables in an SDE DB.  To do the update, you need to download/install SP3 on desktop/sde and then run "update database" from the database properties under ArcCatalog.  However, you must create a direct connection to do the update ... update does not work under the normal ArcCatalog Spatial DB connection.  
  3. General SDE recommendations:  My rule of thumb is"anything you can do in arcgis/catalog, do there ... all else do in pgadmin".  Also remember to refresh after all steps if there's some result you're trying to see.  pgAdmin and ArcCatalog both need refreshes before showing updates.

Friday, February 3, 2012

ArcGIS on a High Performance cluster: Part 1, Linux

Now that we have our new community cluster running at UD  it's time to learn how I can optimize GIS software for that environment.

The first hurdle is operating system

ArcInfo Workstation used to be a great way to run Esri/ArcGIS geoprocessing tasks on *nix boxes.  However, it seems the last version of ArcInfo Workstation that ran on Linux was 9.1.  I'd contacted Esri about obtaining a copy of 9.1, but apparently it is out of production and they do not have any copies of the software that they'd be able to send to me.

So basically the workstation/desktop route is out not available at this time.  But there is more than one way to skin a cat: ArcGIS Server 10 (AGS) runs on Linux.

I've recently seen documentation which suggests geoprocessing models can be leveraged by publishing them as geoprocessing services through ArcGIS Server.  I have heard at the Esri conference, and in some documentation that success has even been reported in distributed tasks, such as building caches, through the SOM/SOC architecture that is available out of the box in AGS.  Could this architecture be extended to distribute geoprocessing tasks?

Taking a different tack, AGS exposes the geoprocessing object through a Python wrappers.  That means that we should be able to programatically run our software on Linux through Python.  Python also has wrappers or libraries for multithreading and MPI (distributed), so the implications for taking advantage of our cluster are especially exciting. 

Note: There are some differences to be expected, such as with file path conventions and name lengths. 

Next: Multithreading