Tuesday, December 20, 2011

Importing Data Dump into Postgres

I needed to take four steps to import an Oracle binary dump into PostgreSQL
  1. Open Oracle dump with something like notepad++ or grap and do a find/replace for all the binary characters (these can be highlighted in notepad++) and replace them with something like a pipe character ("|")
  2. If you know the special characters that are not compatible with UTF-8, then do find/replace on these too now ... otherwise you can do this after running the copy command (step 5), since that will give an error for each non UTF-8-able line/character
  3. Open the piped file in Excel and replace with tabs.  Tabs are the standard delimiter in Postgres and by saving out of Excel, you get a uniform number of columns for each record, thereby avoiding "missing data for column" when executing the "copy" command
  4. Create a table in Postgres with the number of columns seen in Excel, using generic data type such as text
  5. execute the copy command ... no delimiters param need be given, since it's tabs
    COPY data FROM 'C:/temp/data.txt'