Replication failed after upgrading to RSA Authentication Manager 8.1 SP1 patch 2
2 months ago
Originally Published: 2015-04-14
Article Number
000060915
Applies To
RSA Product Set:  SecurID
RSA Product/Service Type:  Authentication Manager
RSA Version/Condition: 8.1 SP1 patch 2
Issue

It has been reported that after upgrading to Authentication Manager SP1 patch 2, replication on all instances failed.

1- SSH to one of replicas.

2-Navigate to /opt/rsa/am/server.

cd /opt/rsa/am/server

3- Start the replication service manually by running the following command"

./rsaserver start

4-Within one minute the replication service failed to start.

5-Reviewing the /opt/rsa/am/server/logs/ReplicaReplication.log found the following error:

@@@2015-04-13 23:14:46,673 FATAL [ApplyP2R latestAppliedSweepId: 2181752 
linesCommittedInNextSweep: 0 nextSweepIdToApply: 2181753]
.......
at java.lang.Thread.run(Thread.java:680)
Caused by: java.sql.BatchUpdateException: Batch entry 2 insert into 
rsa_rep.am_attr_categories 
( id, label_key, is_editable_ind, domain_object_type ) values 
(   E'000000000000000000002001f0020036', E'TOKEN_SOFT_ANDROID_2.x', 'false', E'TOKEN' ) 
was aborted.  Call getNextException to see the cause.
at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2598)
Cause
The cause of this issue is not clear yet, will log a JIRA with Engineering team.  But looking at the error message shows that the replica is trying to apply an insert action for a duplicate record.
Workaround

As a workaround, the replica(s) can be redeployed or all duplicate records in the replica's database can be removed manually.

Before continuing, please take a backup of the Authentication Manager database from the Operations Console.

To remove the duplicate records, follow the steps below:

1- Connect to Authentication Manager server via SSH.
2- Navigate to /opt/rsa/am/utils.

cd /opt/rsa/am/utils

3Run the following command.

./rsautil manage-secret -a listall

4- Part of the output will be as follows:

Database Administrator User ID ........................: rsa_dba
Database Administrator Password .....................: P4c9XXXWm7lBaXXXB7sXXXXNo32Q

5- Copy the rsa_dba Database Administrator User ID and Database Administrator Password to a text editor.
6- Navigate to /opt/rsa/am/pgsql/bin.

cd /opt/rsa/am/pgsql/bin

7- Run the following to access the SQL prompt:

./psql  -h localhost  -p 7050  -d db  -U rsa_dba

8- Enter the Database Administrator Password (from step 4) when prompted.
9-When in the SQL prompt, run the following, where the ID number is the value seen in the ReplicaReplication.log

​DELETE FROM rsa_rep.am_attr_categories WHERE id ='<id value displayed in ReplicaReplication.log>';
commit;

10- Navigate back to/opt/rsa/am/server and start the service.

​cd /opt/rsa/am/server
./rsaserv start

11- Check the /opt/rsa/am/server/logs/ReplicaReplication.log to see if the error happens again.
12- If there are any errors, follow Steps 6 through 11 again, using the new ID value to delete that record.

Notes
The same steps can be used on other replicas in the deployment to remove duplicate database records.