Friday 19 August 2016

Moving data from on-premises to cloud using IBM DataWorks Forge

In this blog post we will show how you can easily move data from an on-premises source such as IBM Db2 database to a cloud based database such as IBM dash DB using IBM DataWorks Forge.  We are assuming that you have already created a service instance of IBM DataWorks Forge and have setup a Secure Gateway. If not you can refer to the following blog posts:

  1. Creating an IBM DataWorks Forge service instance in Bluemix
  2. Setting up a Secure Gateway for connecting to on-prem sources using IBM DataWorks Forge.

Open the IBM DataWorks Forge GUI.    The first thing that you need to do is to create a Data Connection to your source and to your target.  Steps for creating the Data Connection are given below:
  1. Click on "Create Connection" link on the left hand side panel.
  2. Select the source type (IBM Db2 or IBM dash DB)
  3. Specify a name and description for the connection.
  4. Enter the connection details such as Hostname, port, database name and credentials for connecting to the database.
  5. For IBM Db2 (and any other on-premises source) you will also have to provide the secure gateway details. If you have created a secure gateway in the same org and space it will be listed in the drop down list.  Select the secure gateway which is present in the same VLAN as that our your source.
  6. Click on "Create Connection". 
  7. DataWorks will try to connect to the source and if the connection is successful, it will create the connection.
Once the source and target data connections have been created, you can create a DataWorks Forge Activity which will do the actual data movement.  There are two kinds of activities: (a) One that simply copies the source data to the target and (b) others that read the data from the source, do some amount of data shaping such as join, etc., and then copy the transformed data to the target.  In this blog post we will showcase how you can do a straight copy of the source data to the target without any shaping.  
  1. Click on "Refine & Copy" link from the left hand side panel.  It will open the asset browser.
  2. You will be show the list of source connections that you have created.  Select the one which connects to the source IBM Db2 database.
  3. It will then show the list of schemas which are available in the source database.  Select the schema in which your source data is present.  
  4. It will then show the list of tables present in the selected schema.   Select one or more tables from the list.  Data from these selected tables will be copied to the target.  
  5. When you click on a table, the GUI will show the list of columns which are present in the table.  If you only want to see the columns in a table you can click on the table name. If you want to select all the columns in the table, then click on the checkbox next to the table name.  If you wish to select a sub-set of the columns you can click on the checkbox next to the column names. 
  6. The list of selected tables are show on the right hand side.  You can click on the three dots next to a selected table to either (a) see some sample data in the table or (b) delete to table from the selection.
  7. Once the source table(s) have been selected, click on "Copy to Target" button at the top. This will directly copy the data to the target without doing any shaping.  If you wish to shape the data, then click on the "Refine Data" button.  In this blog we will focus on the "Copy to Target" option.
  8. When you click on "Copy to Target" the asset browser is again opened and it show the list of target connections which you have created.  Select the one for dash db which you have created.
  9. It will show the list of schemas present in the dash db.  Please note that if you are using the multi-tenant dash db instance, then only the schema with name starting with DASH<some number> has write permissions.  Hence select this schema.
  10. It will then show the supported table actions.  There are four options: 
    1. Append to the table: If the target table already exists, then data will be appended to it.
    2. Recreate the table: If the table does not exists, it will be created. If it exists, then it will be deleted and created again.  In this process if there are any key constraints and permissions defined on the original table, they will be lost.
    3. Replace the table contents: If the table does not exist, it will be created. If it does exists, then all the data from the table will be deleted and the data from the source table will be copied to it.  The key constraints and permission will remain unchanged.
    4. Merge with the table: This is similar to upsert. In other words, for each record in the source table, if the target table does not contain a record with the same primary key values as the source records, then the source record will be inserted into the target table. However, if the target table contains a record with the same primary key values as that of the source record, then the target record will be updated with the values of the source record. If the target table does not contain a primary key, then the source record will be added to the table without removing the tables existing data.  If the target table does not exist, then it will be created without any primary key and the source records will be copied to the target.
  11. On the right hand side panel, IBM DataWorks Forge will list the set of tables which will get created in the target.  The name of the target table can be changed by clicking on the edit icon next to the table name.  
  12. Finally, provide a name for the activity at the top and then click on Run.  The activity is now running and you will be taken to the activities page.  You can find the newly created activity in the list.
  13. Click on the activity name to see the details of the activity. It will show the status of the activity run.  You will have to refresh the activity details page by clicking on the refresh button at the top right corner to check the latest status of the activity run.  
  14. Once the run is finished, you can check the data movement logs by clicking on the activity run in the "Activity Runs" section of the activity details page.

Thats all there is to using DataWorks Forge to move data from an on-prem source such as IBM Db2 to IBM dash DB on the cloud!

-Manish

No comments:

Post a Comment