Golden Gate: source node 2/4

2020-11-23 0 Di Denis Monari

Setting up the DB

First thing you need a Oracle DB Enterprise Edition 12.1. You need to configure database with this:

  • force logging enabled
  • enable_goldengate_replication parameter set to true
  • supplemental logging set to minimal
  • a Golden Gate administration account with a set of privilegies

You may want also to consider the following to obtain better performances and achieve higher availability

  • undo tablespace big enough to sustain a minimum amount of undo retention based on your needs
  • database flashback capabilities should be active to fully enable Golden Gate integrated replica features
  • archive log enable and a FRA big enough to sustain a minimum amount of replication lag and long transactions

Enabling force logging

ALTER DATABASE FORCE LOGGING

Enabling Golden Gate replication

ALTER SYSTEM SET enable_goldengate_replication=TRUE SCOPE=BOTH SID='*'

Add supplemental logging (minimal)

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA

Golden Gate user account

For your POC you may simply create a user with DBA privileges and use it as your Golden Gate user. But on an open environment (one with developers working on it), you should treat the Golden Gate user the same as any kind of user and grant to him only the bare minimum set of privileges. This can be tricky in a ever-changing environment.

In addition to create session, connect and resource, as long as you need only to capture, you may limit to execute the following and see if it works for you:

DBMS_GOLDENGATE_AUTH.GRANT_ADMIN_PRIVILEGE('goldengate_user')

For a replication Golden Gate user, you may want to grant privileges only to specific objects or schema, especially in a DWH/BI systems where you don’t want that multiple replicators clash together.

Configuring Golden Gate environment

Download the latest Oracle Golden Gate 12.3 version (this guide should works with Golden Gate 19 too), and install it on your Linux box along with an Oracle Client of the same version of the source database or higher.

Setup a proper tnsnames.ora to connect to the source database; you will use it to setup Golden Gate connection string.

After installation, you should setup your environment with some standard variable to make your life easier. The most important should be the ORACLE_HOME of your Oracle Client and one for the Golden Gate home I would call it OGG_HOME.

To login to Golden Gate console, just run:

$OGG_HOME/ggsci

Login to the console and run:

CREATE SUBDIRS

to have Golden Gate create all its required subfolders. You need to run this command only one time (you can even create all the folders by yourself).

Create a credential store and add your database login credential to it as an alias. You will reference this from within Golden Gate.

ADD CREDENTIALSTORE
ALTER CREDENTIALSTORE ADD USER goldengate_user@mydb PASSWORD "goldengate_password" ALIAS GGALIAS

To test it, from the console, simply run the following:

DBLOGIN USERIDALIAS GGALIAS

If it works, you should see a nice “goldengate_user@mydb” string next to the console prompt.

Ensure your Golden Gate Manager is up and running and jump to the next section.

Configuring the integrated extractor

The integrated extractor basically configure the database to capture, fetch and send changed data automatically to the Golden Gate process. You will see a set of log mining processes start reading your database redo and archivelog automatically. The easiest way to setup your extractor is to configure the database to capture one or more whole schema changes. That way you won’t bother to remember to update the database, but only to update the extractor parameter file with the additional tables you may want to capture. The tradeoff would be a slightly increase in database load (if any) and lower security (as changing a simple unencrypted text file – the Golden Gate parameter file – is relatively easy).

Speaking of parameter files, for the purpose of this guide, your Integrated Extractor parameter file should contain the following:

LOGALLSUPCOLS
GETINSERTS
GETUPDATES
GETDELETES
NOCOMPRESSUPDATES
UPDATERECORDFORMAT FULL

This is because we are replicating to AVRO files, and they are not a database with a concept of current state of a record, so we need to give Golden Gate enough information to be able to reproduce a whole record.

Make sure you add your tables to be captured (or the whole schema) to the parameter file.

To capture a whole schema, from the Golden Gate console, login to the database and use the SCHEMATRANDATA option when adding the capture:

DBLOGIN USERIDALIAS GGALIAS
ADD SCHEMATRANDATA MYSCHEMA_NAME ALLCOLS PREPARECSN NOWAIT

Where MYSCHEMA_NAME is of course the schema name you want to capture. If you want to capture more than one schema, simply repeat the ADD command for the other schemas.

WARNING: issuing a new SCHEMATRANDATA command will instruct the database to lock/alter table/unlock each and every table (one table at time) of the schema. You better get prepared for locks and ensure your application is ready to handle these events (or plan for a small downtime).

This operation is one time only (per schema) because it is not related directly to the integrated process you are about to create. In fact you may leave it on forever without any extractor connected (but why?).

You are now ready to add and register your integrated extractor. From the console after connecting to the database, just issue the following:

ADD EXTRACT MYEXT, INTEGRATED TRANLOG, BEGIN NOW
ADD EXTTRAIL /my/full/path/to/the/trails, EXTRACT MYEXT
REGISTER EXTRACT MYEXT DATABASE

Where of course MYEXT is the extractor name (same as you parameter file – ensure you always use uppercase characters or get ready for headache), and the path where you want to place the extractor trails (ensure your GLOBALS file can handle that path).

If everything is ok, you can start the extractor and check it.

The whole series:

Replication: moving data around in near real-time
Golden Gate: from Oracle Database to S3 bucket/HDFS architecture 1/4
Golden Gate: source node 2/4
Golden Gate: data pump and hub nodes 3/4
Golden Gate: replicator node AVRO/HDFS over a S3 Bucket 4/4