What's new

Asuswrt-Merlin service dependency loop (NTP + DNS + USB + Entware)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

protogen

Occasional Visitor
Over the last few days I have been upgrading from 380.65_4 to 380.69_0 on both my RT-AC68U and RT-AC88U (I always keep both routers on the same firmware version). I believe I have discovered a service dependency loop on the AC88U which only occurs under certain conditions.

The main pieces of this puzzle are:
  1. The startup of some services (OpenVPN, USB mount) is now delayed until the built-in ntp has successfully set the system time
    • On the AC68U only OpenVPN appears to be delayed
    • On the AC88U both OpenVPN and USB mounting are delayed
  2. If ntp fails to set the system time the services will wait for about 12 - 14 minutes and then start anyway (I assume this is a fail-safe)
  3. I have disabled the DNS functionality of dnsmasq (dnsmasq.conf.add contains port=0) because I use Entware BIND instead
  4. I have installed BIND on an ext2 partition on a USB2 storage device (SanDisk Cruzer Fit 8GB)
On the AC88U this combination leads to the following during boot up:
  1. dnsmasq starts but DNS functionality is disabled (see 3 above)
  2. ntp starts and attempts to set the system time (nvram ntp_server0 is set to au.pool.ntp.org) but fails because it cannot resolve the hostname (no DNS)
  3. USB mounting is delayed because the system time is not yet set
  4. BIND does not start because the USB mounting is delayed
As you can see, this is a dependency loop: ntp needs DNS needs USB mounting needs ntp.

By the time the fail-safe kicks in (after 12 - 14 minutes) and the USB partition is finally mounted, the services-start script (which also starts Entware) has given up. The script waits only 60 seconds for the USB partition to mount.

To confirm this issue I set ntp_server0 to an IP address (i.e. bypassing DNS). Everything started up normally and worked as expected. There was no delay in the USB mounting which lead to BIND starting correctly.

There are hacks that I could put in place that would work around this (like using an IP address for ntp or starting dnsmasq with DNS functionality then restarting it without DNS just before BIND starts) but these are inelegant and do not address the root cause.

I think this issue may have been introduced in 380.68_0. This is a guess, based on the following Changelog entry.
- FIXED: OpenVPN instances could potentially start too early at
boot time (before clock was set)


Has anyone else experienced this issue? Can anyone else confirm it?
 
Just use an IP for your NTP server.
 
Just use an IP for your NTP server.
I realise I could do this, but I won't.

Why? Well, because it's not a fix. It's not even a good work-around.

NTP servers are known to go down from time to time. That's why I use DNS. If I was to use an IP, and that NTP server goes down, then my router will not boot properly (no DNS, no VPN). I won't take that risk.

This issue does not occur on the AC68U (USB mounting does not wait on ntp). So, why does it occur on the AC88U?
 
You are running a non-standard setup, preventing the router from running its own DNS server, which will disturb more than just NTP functionality. I gave you a workaround that would work for your unusual scenario, beyond that you're on your own there. OpenVPN requires the NTP service to complete its task because TLS require a valid clock to succeed.

No idea about the difference in behavior between models, it must just be a timing issue because that code is event-driven, not run sequentially.

Sent from my P027 using Tapatalk
 
Or add the Entware service start command to the post-mount script?


Sent from my iPhone using Tapatalk
 
You are running a non-standard setup...
Okay, fair enough. I really wouldn't have considered what I'm doing to be that non-standard... but I take your point.

... preventing the router from running its own DNS server, which will disturb more than just NTP functionality.
That's a good point. After checking my log history (on my syslog server) I can see that named starts up an average of 36 seconds after dnsmasq starts. As you say, this could be disturbing more than I'm aware of.

No idea about the difference in behavior between models, it must just be a timing issue because that code is event-driven, not run sequentially.
Strange. I've tested this a total of 20 times - 5 using ntp hostname and 5 using ntp IP, done on both the AC68U and the AC88U - and the results were always consistent with what I've stated in the first post. I'll do some more testing to see if I can determine the cause of the difference.

A personal comment...
I hope you don't feel that I'm trying to be difficult about this. I know I come across that way sometimes. It's a side effect of being "on the spectrum". The combination of my perfectionism and OCD compels me to solve things properly, the first time. I can't use hacks or work-arounds - they keep me awake at night.

As always, I appreciate the advice and help you give. You're always generous with your time and kind in your responses. Thank you.
 
Or add the Entware service start command to the post-mount script?

Yes, this will work, but will take around 12 - 14 minutes to occur (because the USB mounting is delayed and only happens when the fail-safe kicks in). Unfortunately, that's too long to wait.
 
Just to close this one off and provide a solution for anyone else that may encounter this issue, here's how I solved the problem.

Basically, I switched from using dnsmasq.conf.add to using dnsmasq.postconf instead.

First, I deleted dnsmasq.conf.add which contained the port=0 directive (this directive always disabled the DNS functionality of dnsmasq because I use Entware BIND instead).

Second, I installed a new dnsmasq.postconf script, as shown below (note that /opt/etc is my Entware installation, an ext2 partition on a USB2 storage device).
Code:
#!/bin/sh

ME="${0##*/}"
TAG="$ME[$$]"

NAMED_CONF='/opt/etc/bind/named.conf'

if [ -f "$NAMED_CONF" ]; then
        logger -t "$TAG" "Starting dnsmasq with DNS disabled ($NAMED_CONF exists)"
        echo 'port=0' >> $1
else
        logger -t "$TAG" "Starting dnsmasq with DNS enabled ($NAMED_CONF does not exist)"
fi

Lastly, in the start script for Entware BIND I added: service restart_dnsmasq

The result of these changes:
  1. At boot time (before Entware on USB has been mounted) dnsmasq will (temporarily) provide DNS functionality which results in name resolution of the ntp server (au.pool.ntp.org), USB mounting and OpenVPN starting (i.e. no delay is encountered)
  2. Once Entware on USB2 has been mounted (i.e. /opt/etc exists) the start script for BIND restarts dnsmasq and the postconf script adds port=0 to disable the DNS functionality (as BIND takes over the job)
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top