Jump to content


Tevura

Database Replication Link has Failed Site Data after 2012 R2 CU5.

Recommended Posts

Got another one I'm stumped on. So this past Friday I upgraded my hierarchy from CU4 to CU5. I discovered on Saturday that my CAS entered ReplicationMaintenance while all of my Primary sites remain active. It's showing a Link Failed on all under Site Data Replication Status. Great! So I run the Replication Link Analyzer and get this:

 

Rep%20Link_zpsng7a2rds.png

Not very helpful and it does this regardless of which site I run it on through their respective consoles. So my next step is to check rcmctrl.log. All primary sites look clean except for one and the CAS. Both of these are stuck with the following log info that has just been looping since I first checked Saturday:

post-27012-0-76713100-1442269070.png


The basic gist is that I see one error and a couple other items that stand out:

Error: Replication group "General_Site_Data" has failed to initialize for subscribing site C01, setting link state to Error. SMS_REPLICATION_CONFIGURATION_MONITOR 9/14/2015 3:25:55 PM 8120 (0x1FB8)
Site is NOT active, so not calling the DrsActivation procedures on DRSSite queue. SMS_REPLICATION_CONFIGURATION_MONITOR 9/14/2015 3:25:35 PM 3492 (0x0DA4) <- This one is present on the CAS but not the Primary.
No connector role installed SMS_REPLICATION_CONFIGURATION_MONITOR 9/14/2015 10:25:08 PM 7508 (0x1D54) <- This one is on the Primary and not present on the CAS.
I've tried to reinitialize General_Site_Data by placing a .pub file in the rcm inbox and it looks like it starts to do things but then fails and ends up right back where we're at.

Running the Exec spdiagdrs query on the CAS SQL Studio yields results showing all Site Data as failed to replicate across the servers but Global Data is fine. Also, the only site not Active is my CAS.
I've also tried the basics like rebooting the servers, restarting services and even reinstalling CU5. No changes. Seriously stumped and on the verge of contacting Microsoft for assistance but figured I'd reach out here in case anyone has an "Ah ha!" fix.

Share this post


Link to post
Share on other sites

So the solution that got 4 out of 5 back into a healthy state was running a query in SQL Studio to push my CAS out of ReplicationMaintenance. They all began talking and behaving normally. One of the primaries still refuses and when I attempt to reinitialize it, the CAS goes back into a maintenance state. Logs show the following error:

CSqlBCP::BCPIN: bcp_exec failed. SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
*** DRS_Init_BCPIN() failed SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
*** BCP fails due to internal sql error. Check if this table has a trigger failed to execute. SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
CBulkInsert::DRS_Init_BCPIN : Failed to BCP in SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
BCP in result is 2147500037. SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
Error: Failed to BCP in for table CI_CurrentComplianceStatus with error code 2147500037. SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)
Error: Failed to apply BCP for all articles in publication General_Site_Data. SMS_REPLICATION_CONFIGURATION_MONITOR 9/16/2015 12:27:47 PM 3572 (0x0DF4)


The only thing I can see in ConfigMgr is that under Replication Status, General_Site_Data is stuck and causing the failure. The Replication Link Analyzer also states that its unable to initialize the package between this site and my CAS "For the replication groups: General_Site_Data (Tables: CI_CurrentComplianceStatus)" and can't remediate. Absolutely everything else reports green healthy checks across the board except for this. Screenshot below.

repstatus_zpsyii09d7n.png

Any thoughts that might steer me in the right direction would be greatly appreciated.

Share this post


Link to post
Share on other sites

Figured it out with a little Microsoft loving support. Posting my solution here in case it can help anyone else with a very similar/same problem.

CAUSE

The problem here is dupe entries attempting to insert in the 'dbo.CI_CurrentComplianceStatus' table. This is being blocked by the index for the table called 'CI_CurrentComplianceStatus_AK_AS_idx'

Note this issue only fails and presents itself during BCP-IN (bulk copy process). Regular DRS replication via SQL Broker will auto resolve and drop these dupe values while processing.

RESOLUTION

1. Drop the index that is blocking the duplicate entries from inserting.

2. Identify the replication group that failed and force BCP reinit so link goes back active

a. update RCM_DrsInitializationTracking set InitializationStatus = 7 where InitializationStatus = 99 < - This will force the BCP process to try again.

3. Run query to identify the dupes

a. SELECT Count(*), ModelID, CIVersion, ItemKey, UserID

FROM CI_CurrentComplianceStatus

GROUP BY ModelID, CIVersion, ItemKey, UserID

Having count(*)>1

4. Disable the Trigger that forces the table to be read only.

4. Delete the duplicate entries found in step 3

5. Re-add the index that we dropped.

6. Re-enable the Trigger.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...


×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.