Have you ever run into an issue with Skype for Business that no matter how hard you look for an answer, you just can’t seem to find one? I found myself in that position recently and while it’s a bit embarrassing at how obviously simple the answer turned out to be, I think it’s worth sharing in case someone else finds themselves in this same predicament.
I’ve been working on a remediation project for a client. Another consulting company had set up their Skype for Business environment and while it’d been chugging along for almost a half a year, the client had been seeing some strange activity and brought us in to review and remediate as needed.
One of the items that we were working on was to get their Skype for Business infrastructure up to date with the latest updates and Cumulative Updates. After installing the November 2016 Cumulative Update, everything appeared to be working, but a couple of days after the install, I got a report that Call Park wasn’t working. In checking the services, sure enough, I saw that RTCCPS was in a stopped state.
I manually started the service, ran Get-CsWindowsService, and there it was, running. I refreshed a couple times, it stayed running, so I contacted the client and asked them to test. Unfortunately, the test failed. Looking back at the services, it stopped again. Now I’ve got a timeframe for when it stopped, so it’s time to look at the event logs.
I must say, I was actually a bit happy to see that an error had been logged:
Now I have somewhere to start. Looking at this log, it’s pretty obvious what the problem is. It’s trying to open a port on the server’s IPv6 interface and this client doesn’t use IPv6. A quick check confirmed that IPv6 was enabled on the server’s NIC, so I disabled it, restarted the services, and hey! RTCCPS is started. And then … it stopped again. The root cause was a bit deeper than that, it seemed.
I hit the interwebs to see what other people found. It wasn’t long before I realized that while there’s plenty of references similar to “No connection could be made because the target machine actively refused it,” very few of them were directly related to Skype for Business and of the ones that were, they weren’t related to the Call Park service.
This called for some freestyle troubleshooting. I stopped the Skype for Business services, and from Control Panel, I uninstalled the Call Park Service, ran step 2 in the deployment wizard to re-install it, and restarted services. Again, same results: RTCCPS starts, and after a few moments, it stops.
From there, I tried stopping services, uninstalling Call Park, rebooting server, stopping services, running step 2, starting services, and WOOHOO! It started and after three minutes the service was still running. Success? Or not. Like lunch at Taco Bell, it came back to haunt me. It stopped again.
From there, I opened a ticket with Microsoft. Anyone that’s ever opened a ticket with Microsoft knows that one of the first things they do is document your environment. They look at your topology, they go through all your settings, they take notes. Then we got to looking at the Hosts file. Okay, there’s the Edge Pool IP addresses, as well as the IPs for the Edge servers themselves … but what’s this remnant of a configuration long past..?:
Seeing as how IPv6 is disabled on the NIC, could this be where the Call Park Service is getting it? I removed both of these entries, saved the file and I’ll be dipped, the service starts and stays running. In fact, checking the Event Logs, I’m now seeing all the associated information events that I should be!
How could it be that simple? I don’t know. I never would have thought to review the hosts file for this. The service is running on the server itself, it shouldn’t have to do any DNS lookup or even reference the Hosts file, and even if it did, why would it resolve localhost to “vmware-localhost” and why would it pick up on the IPv6 entry instead of the IPv4 entry? Who knows. Why would installing the November 2016 CU cause it to suddenly start looking to the hosts file for localhost? My buddy from Microsoft didn’t have a direct explanation other than he’d seen something similar before.
I can soothe my ego by telling myself that I didn’t think of it because these settings were all in place prior to my engagement with the client and as I hadn’t changed any of them other than adding the Edge IPs. While I hadn’t done anything that would have caused this issue, I now know the resolution was as simple as highlighting two lines and deleting them.
Like I said, it’s embarrassing, but it’s a lesson learned, and now I’ve got just one more thing that I will check for first when troubleshooting ANY Skype for Business issue, and that’s checking for errant configs in the Hosts file — whether I think it should have any bearing on the problem or not.