What's new

Nest Protect cannot connect with DoT enabled - RT-AX88U 384.18

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

I installed @john9527 's patched build and it seems like the protects can connect to their network with DoT enabled. I'll give it 24 hours to make sure they stay connected and report back.

If I have a bit more time tomorrow I may try to remove / re-add a device as well.
 
Glad to hear it seems to have helped. Please keep us posted as you continue your testing.
 
Glad to hear it seems to have helped. Please keep us posted as you continue your testing.

Could this patch be added to the base firmware (Merlin) for everyone to benefit from?
 
Ok, everything is still working, and I was able to remove and re-add a protect to my account.

I'd still like to understand if this was a recent breakage from .17 to .18, and what the intent was of the change, and ideally why it breaks nest. For example, when I did the wireshark + hotspot test, I saw the dns resolution fail for frontdoor.nest.com (the response had no answer section). However I was able to both nslookup and ping that domain from my laptop. When I added one of the domain IP addresses to my laptop hosts file, then the device was able to add to my account via the hotspot. It was at that point I started messing with the DNS settings and tested disabling DoT.

So it's almost like the query doesn't fail for all devices, ie my laptop was successful. Perhaps the protect was doing the query with different parameters?
 
Ok, everything is still working, and I was able to remove and re-add a protect to my account.

I'd still like to understand if this was a recent breakage from .17 to .18, and what the intent was of the change, and ideally why it breaks nest. For example, when I did the wireshark + hotspot test, I saw the dns resolution fail for frontdoor.nest.com (the response had no answer section). However I was able to both nslookup and ping that domain from my laptop. When I added one of the domain IP addresses to my laptop hosts file, then the device was able to add to my account via the hotspot. It was at that point I started messing with the DNS settings and tested disabling DoT.

So it's almost like the query doesn't fail for all devices, ie my laptop was successful. Perhaps the protect was doing the query with different parameters?
Actually, this was introduced with 384.16 and the update to getdns 1.6.0. It's one of those problems that will come and go as the services change their configuration on the internet (for example anycast addresses that add or delete servers which create a larger or smaller dns response). Larger dns responses create the error (no data returned to dnsmasq for the query).

As far as your experience goes, nslookup (and dig) use a different method to get their data...those tools passed for me too. And once you run them, your laptop DNS cache would be 'primed' so it would appear to start working.

The build I did for you backed out a commit from getdns
https://github.com/getdnsapi/getdns/commit/6cb15939ba4020f49222759991bdb95afb383405
I just double checked and there doesn't appear to have been any follow on activity around this commit.

Maybe someone with a higher pay grade than me (@RMerlin or @themiron) can figure out why it's not playing nice with our builds :)
 
Been researching this. Here's a related issue from stubby / getdns.

https://github.com/getdnsapi/getdns/issues/430

It seems like it is up to the client to advertise to stubby that it has a buffer size capable of handling DNS responses >512 bytes.

However dnsmasq man page says it's default edns max packet size is 4096. So something is going on with either stubby's config or dnsmasq's request / config.

The payload response for frontdoor.nest.com is 792 bytes which fits. As you said, tools like dig seem unaffected. Could this perhaps be a problem with the DNS request coming from the nest device? Is the nest as a client capable of specifying a max DNS response size?
 
This sounds like the same issue I've been experiencing for the last few months with SmartThings. I've had to turn off DoT to get the hub to connect to the servers and then I could turn on DoT again,
 
I can tell you if you are using 1.1.1.1 with QUAD9 then you are defeating the purpose of QUAD9. You need to use QUAD9 by itself. The problem is if QUAD9 fails to resolve a bad domain then 1.1.1.1 is going to resolve it.
 
Well I went and installed tcpdump on the router to confirm what is happening. Basically, with DoT enabled the dns request to frontdoor.nest.com is 792 bytes and returns a truncated dns response, TC flag. The client, the protect in this case, should then retry the request with either a larger EDNS buffer size or in TCP mode. However it is not doing this and so fails to connect. This seems like a nest protect issue, but good luck getting google to admit or fix it.

The interesting part is maybe this. If I use dig to test the queries, I see a response size of 792 bytes to the local router, but a response size of 254 bytes if the response is cached by dnsmasq. And I get a response of 269 bytes if I dig @1.1.1.1 directly. So 1.1.1.1 and the dnsmasq cache are compressing the response maybe, and therefore not truncating it. But the cache time for this query is very short, maybe 15 seconds, so practically speaking it always fails for me.

I can't find any way to force dnsmasq into TCP mode or force it to retry truncated responses, or to enable dns compression. Similarly I don't see any way to make stubby do any of these things either. That leaves us kind of stuck. The change to stubby is legitimate and follows standards. However the nest is a broken client. The only thing I can see is to enable compression locally somehow.

@john9527 the reason dig works, is without any special flags it sends a query with an EDNS buffer size of 4096. You can force dig to get a truncated response. For example

dig +bufsize=512 frontdoor.nest.com
or
dig +noedns frontdoor.nest.com

That will get truncated, and you'll see a message that dig automatically retries in TCP mode. To suppress the retry and fail at the truncation (must restart dnsmasq or wait for cache to expire)

dig +noedns +ignore frontdoor.nest.com

Nest had an outage on June 29, and I'll wager they made DNS changes that increased the size of the response >512 bytes and started this whole thing.
 
That's a pretty good investigation! I'm also a NEST user, but not for my smoke alarms (Protect).

Anyways, since John's previously identified code fix still works for you (?) I hope that it gets added to the Mainline so I can benefit from it also, in the event my Nest Cameras also start suffering from this issue (I'm not an AX88 user so can't use the posted build)
 
The interesting part is maybe this. If I use dig to test the queries, I see a response size of 792 bytes to the local router, but a response size of 254 bytes if the response is cached by dnsmasq. And I get a response of 269 bytes if I dig @1.1.1.1 directly. So 1.1.1.1 and the dnsmasq cache are compressing the response maybe, and therefore not truncating it. But the cache time for this query is very short, maybe 15 seconds, so practically speaking it always fails for me.
I don't think this is compression.....I'm pretty sure the DNSSEC keys are stripped off once it's validated and cached....
@john9527 the reason dig works, is without any special flags it sends a query with an EDNS buffer size of 4096
On my fork, dig is reporting the EDNS buffer as 1232.... I thought I had found the problem in their patch ignoring the EDNS buffer size, and wrote a fix, but unfortunately, no joy. Dig does now reports the TCP retry. I'm still looking, but I think they (getdns) may not be correctly setting the truncated flag or dnsmasq reintroduced an old problem where they weren't correctly retrying truncated replies.
 
Last edited:
This sounds like the same issue I've been experiencing for the last few months with SmartThings. I've had to turn off DoT to get the hub to connect to the servers and then I could turn on DoT again,

I had this issue with SmartThings as well and had the help of a Samsung engineer who told me they have two "channels" on the device; one for updates and the other for standard communications. With DOT on, they could see my device on the update channel but otherwise I was disconnected. I turned DOT off and haven't had an issue since. I also found my TP-Link switches and plugs would randomly revert to "local only" and fail to work with Alexa, etc with DOT enabled. I think whatever the issue it's a pretty widespread problem for iOT devices.
 
I don't think this is compression.....I'm pretty sure the DNSSEC keys are stripped off once it's validated and cached....

I don't have DNSSEC enabled, but maybe this is a stubby thing.

On my fork, dig is reporting the EDNS buffer as 1232.... I thought I had found the problem in their patch ignoring the EDNS buffer size, and wrote a fix, but unfortunately, no joy. Dig does now reports the TCP retry. I'm still looking, but I think they (getdns) may not be correctly setting the truncated flag or dnsmasq reintroduced an old problem where they weren't correctly retrying truncated replies.
You can't see what's going on without Wireshark. Dig makes the request and the forward from dnsmasq to stubby has an OPT section with the EDNS buffer size at 4096. That buffer should come from the client I believe if it supports EDNS (not to be confused with TCP mode, EDNS is an extension for UDP). The requests I saw from the protect did not have an OPT section, so were not using EDNS.

Secondly, it is not up to dnsmasq to retry the request when it's truncated, it's up to the client, ie dig. Dnsmasq cannot know what the client supports, so can't just return unexpected data. From what I saw, stubby is setting the truncate flag correctly. I'm not sure exactly how/if the dnsmasq max edns packet size setting plays in, but I didn't see any differences if I manually set it to 4096 (seems the fw has set that value to 1280 for some reason).
 

Similar threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top