2020-04-30 - News - Tony Finch
The news part of this item is that I've updated the stealth secondary documentation with a warning about configuring servers (or not configuring them) with secondary zones that aren't mentioned in the sample configuration files.
One exception to that is the special Cisco Jabber zones supported by the phone service. There is now a link from our stealth secondary DNS documentation to the Cisco Jabber documentation, but there are tricky requirements and caveats, so you need to take care.
The rest of this item is the story of how we discovered the need for these warnings.
The context
Cisco Jabber is designed around a classic enterprise-style
internal/external network architecture with firewalls and DNS views,
which doesn't fit the University very well. The special Jabber DNS SRV
records (_cisco-uds
etc.) have been set up on the phone system's own
DNS servers, which are able to support the special split views more
easily than the central DNS servers.
If the network requirements are satisfied then you can see the Jabber internal view records, but in practice most clients should see the DNS SRV records for the external view.
The problem
With many people working from home, our colleagues in the telecoms
office found that Jabber was not working as expected. After some
investigation it became apparent that the internal view _cisco-uds
DNS SRV records were leaking: often Virgin Media's DNS servers would
return the wrong answers, and sometimes the various public DNS
resovers would as well.
This was very mysterious.
We could not find any configuration problems with the phone system's DNS servers, nor with the central DNS servers, nor with the contents of the DNS zones.
The answer
After much head-scratching and many red herrings and blind alleys, I
worked out that one of the public DNS servers for the cam.ac.uk
zone
was configured as a secondary for Jabber's special internal
_cisco-uds
view. There was a 1-in-6 chance that people outside the
University would get the wrong records, depending on which of our 6
public DNS servers their resolver happened to talk to.
The fix
So we've corrected the configuration mistake, and improved our documentation to reduce the risk of it happening again. But there's a bit more we can do.
One of the things that made this hard to debug was that the usual consistency checking tools such as Zonemaster did not spot the mistake. DNSviz encountered the problem, which gave me a bit of a clue, but DNSviz isn't designed to systematically examine all of a zone's nameservers in the way that Zonemaster does.
The reason Zonemaster didn't find the problem is that it examines a
zone's own nameservers for consistency, but it doesn't check that all
the zone's parent's nameservers have consistent delegations. In our
Jabber case it was one of the parent zone (cam.ac.uk
) servers that
was doing the wrong thing with the child _cisco-uds
zone.
We have a Zonemaster script for checking all our zones, but it currently uses a rather out-of-date version. I'm hoping that after some operating system upgrades it will be more convenient to use a recent version of Zonemaster, and it will make sense to add some extra checks so that Zonemaster can spot and complain about mistakes like our Cisco Jabber leakage.