Recently, I ended up troubleshooting Lync CMS replication in our internal environment. We tried many different things to resolve the issue, but ultimately, it came down to some broken pieces in the actual management components. Let’s talk about the symptoms and – finally – the resolution.
I’m ashamed to say that I don’t always keep track of my Lync environment. I’m not watching the event logs daily and I’m not often checking into the status of my “get-csmanagementstorereplication” output. I know. I feel the shame. Don’t mock me.
Anyway, about two weeks ago, we had an issue. I needed to make a change but I couldn’t get the change to take effect. What was happening was my Lync Server Replica Replicator Agent was crashing. Every time the topology watcher started the agent would crash. Hard.
So I did what most engineers do and asked Dr. Google and his cousin Mr. Bing for answers. I found this wonderful article from my Canadian friend, The Hoff.
I followed that article and sure enough, that fixed my Lync Server Replica Replicator Agent.
But it’s never that easy, is it?
That fix above somehow broke my Lync Server Master Replicator Agent. The symptoms were consistent. I would restart the service and within 50 seconds, it would crash again.
The Lync Server Application Logs showed a consistent pattern:
Event 2003 – LS Master Replicator Agent Service – Starting
Event 2004 – LS Master Replicator Agent Service – Started
Pause 20 seconds
Event 2021 – LS Master Replicator Agent Service – Successfully read CMS
Event 2033 – LS Master Replicator Agent Service – Running in Active Mode
Event 2008 – LS Master Replicator Agent Service – Successfully connected to back-end
Pause 20 seconds
Event 2012 – LS Master Replicator Agent Service – Topology Watcher
CRASH – Event 2007 – LS Master Replicator Agent Service – Unhandled Exception – CRASH
Fortunately for us the Windows Application Log – at the same time as this 2007 Event ID – would throw some useful information for us.
Event 1026 – .NET Runtime Error with MasterReplicatorAgent.exe
Event 1000 – Application Error – with MasterReplicatorAgent.exe – version 5.0.8308.577
That last error – with version 5.0.8308.577 was curious to me:
Because the only Lync Server 2013 Component with that version is:
I uninstalled it and then I reran Lync Server 2013 Deployment Wizard which reinstalled missing components and I let it reinstall the Core Management Server pieces, the replicator agents being part of that.
After that I restarted the services.
Then, low and behold, I got a new error. Luckily it was a useful one:
Yup, possible reinstallation. From there I “invoke-csmanagementstorereplication” and waited a few minutes.
Replication was fixed, hooray! There was great rejoicing in the office.
At the end of the day, a simple configuration change led us down the path of broken replication and broken agents, and a subsequent reinstallation of components.
The moral of the story? Keep an eye on your Lync Server and read your logs. They are useful.