Thursday, March 14, 2013

Performance Tuning in Oracle Data Integrator



Performance Tuning in Oracle Data Integrator:
There are many ETL (extract, transform and load) tools out there but Oracle Data Integrator (ODI) has gained a lot of ground because of its ability to perform event-driven, high-volume batch uploads from disparate data sources and to the variety of target applications/data storage technologies.
To maximize benefits of ODI, it needs to be tuned (with emphasis) and designed well. Here are couples of tips (besides the best practices already covered by many) that I have learnt while designing the packages/interfaces:
  1]Set truncate mode ON for temporary tablesIn development mode, switching off the deletion of temporary tables in the staging area is quite handy for debugging purposes. But these can cost heavily if carried over in the debug mode to production especially if ‘append’ mode is on for the tables.
  2]Tune JVM heap options: ODI uses java runtime environment for most of the knowledge modules and message-driven functionalities. The heap settings for ODI are in the odiparams.bat script file and the default values for the ODI_INIT_HEAP and ODI_MAX_HEAP properties are 32M and 256M. In most of the implementations where ODI is used, these settings are relatively low and result in Out of Memory Error exception when the packages/interfaces are run. Per documentation the recommended settings are 256M for ODI_INIT_HEAP and 1024M for ODI_MAX_HEAP. Per my experience, 256M of init setting is the optimal one but max heap setting should be based on how many packages and the volume of task performed by those packages.
  3]Optimize the batch load size: When uploading to RDBMS data storage, batch upload size can be a very effective setting and improve the runtime of an interface/package significantly. Most of the relational databases have two properties that can be set in the Physical Schema definition of the database technology under Topology Manager. The two settings are Array Fetch Size and Batch Upload Size. These settings are typically quite low and should be increased to the optimal values to make the loads faster.
  4]Remote server URI and access: Many times the packages/interfaces have to access files from the remote servers. One obvious way to handle this is to share the path in the remote server, map the drive to the remote server path where required files are and then use the mapped drive in the package. But I found that ODI was not able to work with mapped drives. You need to give the fully qualified Universal Resource Identification (URI) path. Secondly, the access given to the path should be to a service account that is registered in the domain and can be accessed from within any server in the network. This service account should also be used as the logon account for ODI agent.
 5]Sunopsis Memory Engine vs. External Database: ODI provides Sunopsis Memory Engine as an in-memory database based on HSQLDB that serves quite well as a high-performing temporary data storage for intermediary data transformations. But, this engine is good for not too large set of data because it uses the memory from JVM that ODI runs on. For transformations of large data sets in interfaces, rather use an external RDBMS as the target or staging data-storage with truncate mode switched on (as mentioned in point 1). There is a good series of external blog on Sunopsis Memory Engine that also addresses this point in detail and more (Under the hood of the Sunopsis Memory Engine ).

Sunday, March 10, 2013

Types of Constraints


Types of Constraints:
1. NOT NULL: As per business logic, a column or a set of columns in a table can not allow NULL values, then NOT NULL constraint can be used to enforce this rule.
e.g
ALTER TABLE emp MODIFY (emp_id NOT NULL);

2. UNIQUE: As per business logic, a colum or a set of columns in a table need to store unique values, then, UNIQUE constraint can be used to enforce this rule.
e.g
CREATE TABLE test (col1 NUMBER UNIQUE);

UNIQUE constraints allow NULL values to be stored.
Note:
i] Unique constraint can allow a null value.
ii]Unique constraint with NOT NULL can be as like P.K. since it cannot allow null values.

3. PRIMARY KEY : Primary Key constraint is a combination of NOT NULL and UNIQUE constraints. The column or the set of columns on which Primary Key is defined, will allow only unique and not null values. There can only be 1 (and only 1) primary key in an Oracle table.
e.g
CREATE TABLE test (col1 NUMBER PRIMARY KEY);

4. FOREIGN KEY:  It is frequenly required that data in one table should be validated by comparing it to data in other table. To achieve this kind of data integrity, foeign key constrained is used. This type of validation is also known as referential integrity. A foreign key constraint always makes refrence to a Primary key or a unique constraint of other table. The table that has foreign key defined is called referencing table. The table that has Primary key or Unique constraint defined is called referenced TABLE.
e.g
CREATE TABLE orders
(
  order_no NUMBER PRIMARY KEY,
  customer_name VARCHAR2(10),
  CONSTRAINT cons_prod_fk FOREIGN KEY(prod_no) REFERENCES product(prod_no)
);
1] If you define the foreign key constraint with 'ON DELETE CASCADE' option, then if any rows are deleted from the referenced table, then the corresponding rows will also be deleted from the referencing table.
2]NULL values are allowed in Foreign Key columns.

5. CHECK:  Check constraints are used to enforce one or more conditions to be checked for the data row.
e.g
ALTER TABLE customers ADD CONSTRAINT customer_credit_limit CHECK (credit_limit <= 1000)

More on Constraints:
  • In ALL_CONSTRAINTS table - Constraint names can be found.
  • In ALL_CONS_COLUMNS - Column names on which constraints are defined can be found.
  •  Constraints can, at any time, be either enabled or disabled. When you CREATE, disable or enable a constraint, you may speify some other information regarding how the constraint behaves. An enabled constraint can have two options VALIDATE and NOVALIDATE. VALIDATE will validate the existing data in the TABLE while NOVALIDATE will not validate the existing data afyter the constraint is enabled.
  • When you CREATE or enable a Primary key or Unique constraint, Oracle will CREATE a unique index on the columns of that constraint. Foreign key constraints do not enforce an automatic creation of index.However it is worthwhile to build an index on the columns of each foreign key constraint. Without an index on the corresponding columns in the child table, Oracle is forced to take out a table lock on the child while it performs the DELETE on the parent. If an index is existing, Orcale uses it to identify and lock just the necessary rows in the child while the parent row is deleted.