Recently I wrote about an internal Lync Server CMS replication issue we were experiencing. While we were fixing that issue, we also had a report come in that our Response Groups were not working properly. Hooray! When it rains, it pours I suppose.
We use Lync RGS for simple Hunt Groups – like Tech Support, and Purchasing, and Sales, etc. – all inbound phone calls (other than private DIDs) come into Lync, then to an Exchange Auto Attendant – and then to Response Groups. They are pretty important to us.
We are a managed services provider (MSP), so having our Tech Support Response Group Queue down is not good — really not good. Let’s see what’s going on, shall we? Warning: there are a lot of screen shots below. I apologize in advance for all the scrolling you’re about to do.
Symptom 1: When you called a response group directly, you get a fast busy.
Symptom 2: When you called our main number and Exchange AA picks up and you transferred to a Response Group, you got a “call cannot be completed” and got sent back to the AA.
Symptom 3: When you called a Response Group while tracing S4 in OCSLogger/Snooper – you get a nice generic 26017 entry
Symptom 4: When you restarted the Response Group Service (RTCRGS), it consistently pushed errors related to WorkflowRuntime and Contact Objects.
And another curious “Information” log entry in the middle of those…
That doesn’t make sense. At all.
I started by focusing on the 31067 error and begun looking at the RGS Application Endpoints in the Lync Shell.
Then I verified those Application Contact Objects existed in ADSI Edit – and they did.
And…
So that’s annoying.
Let’s go back to the 31035 Event. I traced all the RGS items in OCSLogger/Snooper while I restarted the server and found this curious log:
TL_ERROR(TF_COMPONENT) [2]1C38.2C98::11/16/2014-02:45:12.980.00000457 (RgsHostingFramework,AepManager.StartConnectingAep:aepmanager.cs
(1144))(0000000002D826E8)[Exit] – Could not establish AEP AEP address=[sip:RtcApplication-77d9d0db-f566-4ed7-85b7-c03996c65e80@mirazon.com], exception=[System.InvalidOperationException: An application endpoint with the same uri already exists on the CollaborationPlatform.
Okay. That’s the RGS Presence Watcher. Interesting.
Another interesting log entry just above it:
TL_ERROR(TF_COMPONENT) [1]1C38.216C::11/16/2014-02:44:11.371.00000238 (RgsHostingFramework,AepManager.StartConnectingAep:aepmanager.cs
(1144))(0000000002D826E8)[Exit] – Could not establish AEP AEP address=[sip:RtcApplication-77d9d0db-f566-4ed7-85b7-c03996c65e80@mirazon.com], exception=[System.InvalidOperationException: The requested Performance Counter is not a custom counter, it has to be initialized as ReadOnly.
Huh…Performance Counters. I reached out to some friends and they asked me to look at the Registry Entry for the Windows Workflow Foundation entries.
Specifically, I was asked about HKLM – System -> CurrentControlSet -> Services -> WWF 3 and WWF 4 like below:
You’ll notice that under PerIniFile it references PerfCounters.ini. That’s what it says now. Before, it said “PerfCounters_d.ini” (with the underscore d) which is odd.
At my friends’ recommendation, I reloaded the performance counters in both WWF 3.0 and 4.0 folders.
And the following log entries appear:
And:
Okay. I waited and took a deep breath. And I restarted Response Groups – RTCRGS again.
Hooray! You could see RGS Stop and Start, and the Application Endpoints were created without error.
All is well.
So, what was the problem? I guess the Response Group problem was related to Performance Counters – specifically the Windows Workflow Foundation performance counters. Reloading those fixed whatever it was that was cause the RGS Errors at the top of this blog post.