|
| |
Data Migration Services ManagerDigital Gridworks™ Studio provides a powerful and flexible framework for migrating, replicating, and manipulating data wihin a single database system or acrossmany disparate data sources. The framework is called Data Migration Services and is mainly accessed through the DMS Migration Manager. The DMS Migration Manager allows youto create, edit, load and save DMS Migration Packages. These packages are made up of a list of DMS Tasks which do different things and can be ordered to process data in specified ways.Currently, the DMS framework supports four types of tasks which are explained below. Onceyou set up a list of tasks, you can run a set of selected tasks, or run the entire package. As you discover the power of this framework, you will find that it is very extensible. As withall technology that manipulates your valuable data, always build and test your DMS packages in a staging or development environment, and backup your data before performing a migration. TheDMS framework is very powerful and will do anything you ask it to - including replicating emptydevelopment tables to full production tables. (That might make your company's sales reports look rather meager). Migration TasksMigration tasks are tasks that take data from a data provider, manipulate it, and then mapit to the fields of a data consumer. Data providers and consumers are extensions that give thedata migration services a uniform interface to many different data sources and destinations. These data sourcescould be JDBC data sources, text and CSV files, or any other data source that can represent itsdata in a tabular format. Data destinations could be JDBC data sources, text, CSV, HTML, XML andSQL files. If you are familiar with Java, you can write your own data providers and consumers.When you create a migration task, you will be asked to choose and configure a data provider and a data consumer. Depending on what you choose, you may be able to customize the output columnsand query. After the provider and consumer are configured, you are then asked to map the data from the providerto the consumer. You can either map the columns invididually, or have the wizard map them by name.If you wish to do further manipulation with the data, you can edit the tranform mapping script by clickingon the Use Advanced Mapping checkbox. A script editor window will appear with the current script loaded. This is simply a Java file (more precisely, a BeanShell file) that is dynamically interprettedfor every row that is migrated. The sourceData object array contains the fields in the source data providerand the destinationData object array contains the fields in the data consumer. Let's say that we wanted to add 3 to the first column (if it is an Integer), uppercase the second column if less than 10 characters in length, and trim whitespace off the third column. This is how you could do it: public void transform(Object[] sourceData, Object[] destinationData) { // Add 3 if an int try { int i = Integer.parseInt(sourceData[0].toString()); i += 3; sourceData[0] = new Integer(i); } catch (NumberFormatException e) {} // uppercase if less than 10 chars in length String x = sourceData[1].toString(); if (x.length() < 10) sourceData[1] = x.toUpperCase(); // trim whitespace String y = sourceData[2].toString(); sourceData[2] = y.trim(); destinationData[0] = sourceData[0]; destinationData[1] = sourceData[1]; destinationData[2] = sourceData[2];}Of course, these tasks could easily be done in SQL on most databases, but you can see the power of being able to process each row by applying custom logic using Java. You may note that if you knowwhat type of objects will be in the array, you can forgo most of the error catching and type checkingthat would normally be required. Most of the data providers that are text based always pass their data in as String objects. However, the JDBC provider will pass in the objects that are mapped to their database column data types per the JDBC driver. Therefore, if you know that the first columnis an int, and the JDBC driver represents that as an Integer, then you can tranform it without convertingit to a String first. Likewise, all data consumers invoke the .toString() method on each object so as long as you provide objects that give a good textual representation of their data via the .toString()method, you can use any object in the destination array. Replication TasksReplication tasks serve the specific purpose of comparing two data sets, finding the differences,and merging them from the source to the destination. There is really no data transformation thattakes place in replication tasks (although you can specify a custom source SQL query that transforms the data that comes from the source database). Both source and destination are connections in the connection tree. These tasks are very accurate, but may take some time to run on very large data setsbecause they fetch the data from both the source and destination to create the differences. Theyshine in areas where you would like to publish data across multiple databases of the same or different types, and on a case by case basis (such as pushing changes from a staging to a production environment).The task is non-destructive (ie, it doesn't truncate the destination table and copy the contents of the source table) and fails gracefully when constraint conflicts occur. Therefore, it can be used to replicatedata out to production databases that you do not wish to take down in order to copy data out to. SQL Script TasksSQL Script tasks are tasks that contain a SQL script and are run on a database connection. External Command TasksExternal command tasks are tasks that run external commands on the operating system. You can use these tasks to invoke other applications or scripts (shell, perl, php, python) via the command line, or use the command line itself. Of course, this is a very platform-specific task,so if you are sharing this task between different environments, be aware of platform differences.Running DMS Tasks and PackagesOnce your tasks are set up and in the proper order, you can either run an individual task or selection oftasks, or you can run the whole package. The Run Tasks dialog will appear and will display the statusand logs of the running tasks. Importing Tasks and PackagesIf you find that you have several tasks in multiple package files, you can import tasks from a package fileinto a currently open pacakge. Use the Import Tasks menu item to select the package file. If it is a validfile, the tasks within that file will be added to the current package.Sharing DMS PackagesYou can easily shared DMS packages with others. However, you should be aware that several tasks reference external connection objects by name, but they save the connection information inside their own state. This allows someone to create a package, and then delete or edit the connections in their connection tree. The DMS packagewill use the saved data while running the tasks, but when you try to edit a task that referencingan external object, it will force you to select another connection if it cannot find thatconnection in the connection tree. LogsAs the DMS engine runs each task, most of the tasks will insert entries into the run log. Thesecan be routine task updates, errors, connection failures or other messages indicating the state of the task. By default, the Task Runner form suppresses this log and only reports the completionstate and a other other details. You can view the expanded log by clicking on the Show Logbutton after all the tasks complete. Note that this log is separate from the application system logs. |