Performance Tuning in Oracle Data Integrator:
There are many ETL (extract, transform and load) tools out there
but Oracle Data Integrator (ODI) has gained a lot of ground because of its
ability to perform event-driven, high-volume batch uploads from disparate data
sources and to the variety of target applications/data storage
technologies.
To maximize benefits of ODI, it needs to be tuned (with
emphasis) and designed well. Here are couples of tips (besides the best
practices already covered by many) that I have learnt while designing the
packages/interfaces:
1]Set truncate mode ON for temporary tables: In
development mode, switching off the deletion of temporary tables in the staging
area is quite handy for debugging purposes. But these can cost heavily if carried
over in the debug mode to production especially if ‘append’ mode is on for the
tables.
2]Tune JVM heap options: ODI uses java runtime environment for
most of the knowledge modules and message-driven functionalities. The heap
settings for ODI are in the odiparams.bat script file and the
default values for the ODI_INIT_HEAP and ODI_MAX_HEAP properties are 32M and
256M. In most of the implementations where ODI is used, these settings are
relatively low and result in Out of Memory Error exception when the packages/interfaces
are run. Per documentation the recommended settings are 256M for
ODI_INIT_HEAP and 1024M for ODI_MAX_HEAP. Per my
experience, 256M of init setting is the optimal one but max heap setting should
be based on how many packages and the volume of task performed by those
packages.
3]Optimize the batch load size: When uploading to RDBMS data
storage, batch upload size can be a very effective setting and improve the
runtime of an interface/package significantly. Most of the relational databases
have two properties that can be set in the Physical Schema definition
of the database technology under Topology Manager. The two settings are Array
Fetch Size and Batch Upload Size. These settings are
typically quite low and should be increased to the optimal values to make the
loads faster.
4]Remote server URI and access: Many times the packages/interfaces
have to access files from the remote servers. One obvious way to handle this is
to share the path in the remote server, map the drive to the remote server path
where required files are and then use the mapped drive in the package. But I
found that ODI was not able to work with mapped drives. You need to give the
fully qualified Universal Resource Identification (URI) path. Secondly, the
access given to the path should be to a service account that is registered in
the domain and can be accessed from within any server in the network. This
service account should also be used as the logon account for ODI agent.
5]Sunopsis Memory Engine vs. External Database: ODI provides
Sunopsis Memory Engine as an in-memory database based on HSQLDB that serves
quite well as a high-performing temporary data storage for intermediary data
transformations. But, this engine is good for not too large set of data because
it uses the memory from JVM that ODI runs on. For transformations of large data
sets in interfaces, rather use an external RDBMS as the target or staging
data-storage with truncate mode switched on (as mentioned in point 1). There is
a good series of external blog on Sunopsis Memory Engine that also addresses
this point in detail and more (Under the hood
of the Sunopsis Memory Engine ).
No comments:
Post a Comment