What's new

No internet on clients after firewall restart (WG, Unbound, YazFi)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

lluke

Occasional Visitor
Hi all,
on my 2 RT-AC86U whenever I face a connectivity re-connect (also done from the scmerlin add-on Internet Connection restart action) or a VPN change (e.g., updating and saving the IPSec VPN Server config) restart the firewall then every service on the router is properly working (Unbound DNS, WireGuard tunnel, IPSec Server, YazFi Networks) but the internet connectivity is not working anymore for any client connected to the network

My first focus was on wireguard (last addition to my addons mix) but after some investigation by looking at the forward chain it seems the issue could be related to YazFi.

Indeed, the only difference before and after a connection issue (or even manual restart from scmerlin) are the following 2 rules missing:

pkts bytes target prot opt in out source destination
0 0 YazFiDNSFILTER_DOT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:853
105 190 YazFiFORWARD all -- * * 0.0.0.0/0 0.0.0.0/0

Could the absence of these 2 rules be the root cause of the internet connectivity outage on clients?

See updates below (post #3 and #4), the issue on the iptables rules is slightly different.

My current setup on both routers is:

Asuswrt Merlin 386.9
YazFi v4.4.2
Unbound Manager v3.22
ntpMerlin v3.4.5
scMerlin v2.4.0
WireGuard Mgr v4.18

Any advice on how to better investigate (and hopefully solve) the issue would be more than welcome.


Edited after the additional discoveries (see post #3 and #4)
 
Last edited:
Hi all,
on my 2 RT-AC86U whenever I face a connectivity re-connect (also done from the scmerlin add-on Internet Connection restart action) or a VPN change (e.g., updating and saving the IPSec VPN Server config) then every service on the router is properly working (Unbound DNS, WireGuard tunnel, IPSec Server, YazFi Networks) but the internet connectivity is not working anymore for any client connected to the network

My first focus was on wireguard (last addition to my addons mix) but after some investigation by looking at the forward chain it seems the issue could be related to YazFi.

Indeed, the only difference before and after a connection issue (or even manual restart from scmerlin) are the following 2 rules missing:

pkts bytes target prot opt in out source destination
0 0 YazFiDNSFILTER_DOT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:853
105 190 YazFiFORWARD all -- * * 0.0.0.0/0 0.0.0.0/0

Could the absence of these 2 rules be the root cause of the internet connectivity outage on clients?

My current setup on both routers is:

Asuswrt Merlin 386.9
YazFi v4.4.2
Unbound Manager v3.22
ntpMerlin v3.4.5
scMerlin v2.4.0
WireGuard Mgr v4.18

Any advice on how to better investigate (and hopefully solve) the issue would be more than welcome.
As we both are running wgm, yazfi, unbound and scmerlin I tried this today when I and my router got to spend some alone time.

On scmerlin I pressed Restart on Internet connection. Waited a minuted I noticed my phone whas kicked out of wifi as it should when YazFi restarts. After reconnect my internet connection was fine again.

Checking the syslog there is a lot of noise from the event but it clearly shows YazFi restarting after the event:
Code:
Jan 16 17:51:38 RT-AC86U-D7D8 YazFi: Firewall restarted - sleeping 10s before running YazFi
Jan 16 17:51:48 RT-AC86U-D7D8 YazFi: YazFi v4.4.2 starting up
Jan 16 17:51:55 RT-AC86U-D7D8 YazFi: Forcing YazFi Guest WiFi clients to reauthenticate

I also checked the firewall rules you refer to and they are still there (or more correct, reapplied)

Do you get the same log entries?

/Zeb
 
As we both are running wgm, yazfi, unbound and scmerlin I tried this today when I and my router got to spend some alone time.

On scmerlin I pressed Restart on Internet connection. Waited a minuted I noticed my phone whas kicked out of wifi as it should when YazFi restarts. After reconnect my internet connection was fine again.

Checking the syslog there is a lot of noise from the event but it clearly shows YazFi restarting after the event:
Code:
Jan 16 17:51:38 RT-AC86U-D7D8 YazFi: Firewall restarted - sleeping 10s before running YazFi
Jan 16 17:51:48 RT-AC86U-D7D8 YazFi: YazFi v4.4.2 starting up
Jan 16 17:51:55 RT-AC86U-D7D8 YazFi: Forcing YazFi Guest WiFi clients to reauthenticate

I also checked the firewall rules you refer to and they are still there (or more correct, reapplied)

Do you get the same log entries?

/Zeb
@ZebMcKayhan first of all thanks for your answer and sorry for my late one, but unfortunately I cannot disrupt my network outside of the weekend.

I've done some additional experiments and I've been done to reduce the variables causing the issue; it seems that a restart of the firewall (service restart_firewall) triggers the problem.
In addition, I confirm that when the issue occurs, I cannot connect to the internet on any device on my network (through "normal" Wifi, YazFi networks, or LAN).

Looking at the logs (see below, I've just removed from it the re-connections of wifi devices), it seems to me that YazFi is properly restarted. Do you have any advice on what to investigate now to spot the issue?


Code:
Jan 21 10:43:30 rc_service: service 24553:notify_rc restart_firewall
Jan 21 10:43:30 custom_script: Running /jffs/scripts/service-event (args: restart firewall)
Jan 21 10:43:30 custom_script: Running /jffs/scripts/firewall-start (args: ppp0)
Jan 21 10:43:30 YazFi: Firewall restarted - sleeping 10s before running YazFi
Jan 21 10:43:35 (wg_firewall): 24663 Checking if WireGuard® VPN Peer KILL-Switch is required.....
Jan 21 10:43:35 (wg_firewall): 24663 Restarting WireGuard® to reinstate RPDB/firewall rules
Jan 21 10:43:35 (wg_manager.sh): 24680 v4.18 Requesting WireGuard® VPN Peer stop (wg21)
Jan 21 10:43:35 (wg_manager.sh): 24680 v4.18 Requesting termination of WireGuard® VPN 'server' Peer ('wg21')
Jan 21 10:43:36 lldpd[2382]: removal request for address of 192.168.26.2%27, but no knowledge of it
Jan 21 10:43:36 wg_manager-wg21: Executing PostDown: 'iptables -D INPUT -p udp --dport 61821 -j ACCEPT'
Jan 21 10:43:36 wg_manager-wg21: Executing PostDown: 'iptables -D INPUT -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -t nat -D PREROUTING -p udp --dport 61821 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -t nat -D POSTROUTING -s 192.168.26/24 -o br0 -j MASQUERADE'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -D INPUT   -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -D FORWARD -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -D FORWARD -o wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -D FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT'
Jan 21 10:43:37 wg_manager-serverwg21: WireGuard® VPN 'server' Peer (wg21) on 192.168.26.2:61821 Terminated
Jan 21 10:43:37 (wg_manager.sh): 25140 v4.18 Requesting WireGuard® VPN Peer start (wg21)
Jan 21 10:43:37 (wg_manager.sh): 25140 v4.18 Initialising Wireguard® VPN 'server' Peer (wg21)
Jan 21 10:43:37 wg_manager-serverwg21: Initialising WireGuard® VPN 'Server' Peer (wg21) on 192.168.26.2:61821
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -I INPUT -p udp --dport 61821 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -I INPUT -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -t nat -I PREROUTING -p udp --dport 61821 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -t nat -I POSTROUTING -s 192.168.26/24 -o br0 -j MASQUERADE'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -I INPUT   -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -I FORWARD -i wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -I FORWARD -o wg21 -j ACCEPT'
Jan 21 10:43:37 wg_manager-wg21: Executing PreUp: 'iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT'
.....
Jan 21 10:43:40 YazFi: YazFi v4.4.2 starting up
.....
Jan 21 10:43:42 YazFi: wl0.2 (SSID: MyWifi_iot) - sending all interface internet traffic over WAN interface
.....
Jan 21 10:43:44 YazFi: wl0.3 (SSID: MyWifi_iiot) - allow internet disabled, blocking all interface internet traffic
Jan 21 10:43:46 YazFi: wl1.1 (SSID: MyWifi_Guest) - sending all interface internet traffic over WAN interface
Jan 21 10:43:49 YazFi: Executing user script: /jffs/addons/YazFi.d/userscripts.d/allow_dest_homebridge.sh
Jan 21 10:43:49 YazFi: Forcing YazFi Guest WiFi clients to authenticate
.....
Jan 21 10:44:05 YazFi: YazFi v4.4.2 completed successfully
 
I've done some additional checks through iptables, in detail using the command iptables -S and comparing the output before and after a firewall_restart.

After the restart, I've noticed 2 rules missing and some of them in a different position but with the same "relative" order (see details below), could be this the root cause of the connectivity issue?


Rules missing when the issue occurs:
-A INPUT -i eth0 -p igmp -j ACCEPT
-A FORWARD -d 224.0.0.0/4 -i eth0 -j ACCEPT

Rules appearing in an upper position when the issue occurs:
-A INPUT -j YazFiINPUT

Rules appearing in a lower position when the issue occurs:
-A FORWARD -p tcp -m tcp --dport 853 -j YazFiDNSFILTER_DOT
-A FORWARD -j YazFiFORWARD
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i wg+ -m comment --comment "Wireguard ACL" -j WGM_ACL_F
-A FORWARD -i br0 -o wg21 -m comment --comment "LAN to WireGuard \'server clients\'" -j ACCEPT
-A FORWARD -i wg21 -m comment --comment "WireGuard \'server\'" -j ACCEPT
 
After the restart, I've noticed 2 rules missing and some of them in a different position but with the same "relative" order (see details below), could be this the
The order of things may change from an event like this. It basically means that various addons starts in different order after a wan event compared to a normal boot.

This is wierd. Could you please check nat-start and firewall-start so all looks good? Sometimes when an addon adds an entry in these files it does it wrong and the entry end up on the same line as something else. Here are mine for reference:
Code:
admin@RT-AC86U-D7D8:/# cat /jffs/scripts/nat-start
#!/bin/sh

/jffs/addons/wireguard/wg_firewall            # WireGuard

Code:
admin@RT-AC86U-D7D8:/tmp/home/root# cat /jffs/scripts/firewall-start
#!/bin/sh

sh /jffs/scripts/firewall start skynetloc=/tmp/mnt/UsbDrv/skynet # Skynet
/jffs/scripts/YazFi runnow & # YazFi Guest Networks

/jffs/addons/wireguard/wg_firewall            # WireGuard
sh /jffs/addons/diversion/type65blocking.div # Added by Diversion

Edit: you could try to execute these to see that they finish the way they should. If some script hangs and firewall-start does not complete its execution then I think the firmware may be affected in a bad way, and may cause your issue.
 
Last edited:
The order of things may change from an event like this. It basically means that various addons starts in different order after a wan event compared to a normal boot.

This is wierd. Could you please check nat-start and firewall-start so all looks good? Sometimes when an addon adds an entry in these files it does it wrong and the entry end up on the same line as something else. Here are mine for reference:
Code:
admin@RT-AC86U-D7D8:/# cat /jffs/scripts/nat-start
#!/bin/sh

/jffs/addons/wireguard/wg_firewall            # WireGuard

Code:
admin@RT-AC86U-D7D8:/tmp/home/root# cat /jffs/scripts/firewall-start
#!/bin/sh

sh /jffs/scripts/firewall start skynetloc=/tmp/mnt/UsbDrv/skynet # Skynet
/jffs/scripts/YazFi runnow & # YazFi Guest Networks

/jffs/addons/wireguard/wg_firewall            # WireGuard
sh /jffs/addons/diversion/type65blocking.div # Added by Diversion

Edit: you could try to execute these to see that they finish the way they should. If some script hangs and firewall-start does not complete its execution then I think the firmware may be affected in a bad way, and may cause your issue.

Everything sounds really weird to me too, this is way I’m getting crazy for this issue!

Bad (or good?) news is that I’ve done the same tests on the second RT-AC86U I own and the results are the same (same issue, same behaviour on iptables, …).
About the scripts I’m missing the nat-start, while the firewall-start is fine.

I’ve also managed to run each step of these script individually and I think I’ve found the fault, if I manually run the wg_firewall script (/jffs/addons/wireguard/wg_firewall) then the internet connectivity is lost on all clients.
At this point I’d say the issue is related to Wireguard, any hint on what to look at now?
 
if I manually run the wg_firewall script (/jffs/addons/wireguard/wg_firewall) then the internet connectivity is lost on all clients
Even without restarting wan??

The only thing wg_firewall does is that it restarts wireguard clients. Like this:
Code:
/jffs/addons/wireguard/wg_manager.sh stop
/jffs/addons/wireguard/wg_manager.sh start

what happens if you run those 2 lines after each other? Still loosing wan access?

If you do there are some wg peer causing this (I would guess). When you have lost wan access try to go into wgm and stop your peers, one by one, both server and clients and see if the wan access comes back.
 
Last edited:
Even without restarting wan??

The only thing wg_firewall does is that it restarts wireguard clients. Like this:
Code:
/jffs/addons/wireguard/wg_manager.sh stop
/jffs/addons/wireguard/wg_manager.sh start

what happens if you run those 2 lines after each other? Still loosing wan access?

If you do there are some wg peer causing this (I would guess). When you have lost wan access try to go into wgm and stop your peers, one by one, both server and clients and see if the wan access comes back.

Ok, so I think we're getting closer and closer to the root cause!

It seems that the internet connectivity stops working as soon as I run:
Code:
/jffs/addons/wireguard/wg_manager.sh stop

When I run the stop command this is the output I get:
Code:
Requesting WireGuard® VPN Peer stop (wg21)
wg_manager-serverwg21: WireGuard® VPN 'Server' Peer (wg21) on 192.168.26.2:61821 (# Home - 192.168.10.0/27) Terminated

The issue happens also if I open wg from amtm and I run:
Code:
5 wg21

As soon as the command ends I'm not able to connect on the other side of the tunnel (expected) and to the internet (not expected), if then I run the start command I get the tunnel back (expected) but nothing changes to the internet connectivity.

Two additional infos I hope could help:
  • my wireguard setup is a site2site (done following this great guide) between 2 RT-AC86U and it is used only to provide internal resource connectivity between the 2 sites
    Internet connectivity on both sites is done directly through the "local site" WAN interface.

  • whenever I open wg from amtm I get this message
    Code:
    Press y to Delete rogue RPDB PRIO 220 rules or press [Enter] to SKIP.
 
Last edited:
Ok, so I think we're getting closer and closer to the root cause!

It seems that the internet connectivity stops working as soon as I run:
Code:
/jffs/addons/wireguard/wg_manager.sh stop

When I run the stop command this is the output I get:
Code:
Requesting WireGuard® VPN Peer stop (wg21)
wg_manager-serverwg21: WireGuard® VPN 'Server' Peer (wg21) on 192.168.26.2:61821 (# Home - 192.168.10.0/27) Terminated

The issue happens also if I open wg from amtm and I run:
Code:
5 wg21

As soon as the command ends I'm not able to connect on the other side of the tunnel (expected) and to the internet (not expected), if then I run the start command I get the tunnel back (expected) but nothing changes to the internet connectivity.

Two additional infos I hope could help:
  • my wireguard setup is a site2site (done following this great guide) between 2 RT-AC86U and it is used only to provide internal resource connectivity between the 2 sites
    Internet connectivity on both sites is done directly through the "local site" WAN interface.

  • whenever I open wg from amtm I get this message
    Code:
    Press y to Delete rogue RPDB PRIO 220 rules or press [Enter] to SKIP.
Alright, some issue with wg21 then. Wierd that it works on boot but not to restart.

Also noticed in your logs this error:
Jan 21 10:43:36 lldpd[2382]: removal request for address of 192.168.26.2%27, but no knowledge of it
could there be some typo left from when you created the site2site? Did you specify a /27 network?

dont know how much output you will get but you could try
Code:
E:Option ==> stop wg21 debug
and see if there are any additional info.

According to Asus the ax88u has a limitation of 2 vpn peers due to hardware limitations. I would assume this applies to ac86u as well and I get some resource error when trying to use more then 2 on my ac86u but I have never experienced any real issues from it. Just something to note maybe as it looks like you are running 3 peers.
 
could there be some typo left from when you created the site2site? Did you specify a /27 network?
Ahhhhhhhhhh!!! Here we are, you make my day!

I’ve recreated from scratch the whole wireguard setup and I’m now able to replicate the issue.
When you create a site2site using wg from amtm if you specify a “full subnet” (ip and netmask, e.g. 192.168.20.0/27) then this weird behaviour appears.
I really don’t know what is then causing the issue while stopping a wg server, but definitely now I see less rules in the 2 sites configs.


Wierd that it works on boot but not to restart.
I guess this is because on boot there isn’t any stop of peers, which was causing the whole issue.

@ZebMcKayhan really many many thanks for your help, I can get back sleeping like a baby and forget about monitoring the 2 sites
 
I really don’t know what is then causing the issue while stopping a wg server, but definitely now I see less rules in the 2 sites configs.
Great!

The only thing I could link to a behaviour like this is this rule:
Code:
Jan 21 10:43:37 wg_manager-wg21: Executing PostDown: 'iptables -D FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT'

The firmware puts a similar rule to let all replies back and wg21 puts in the same one using -A (Append) so in ends up in the bottom. But the firmware usually drops all packets at the bottom and this rule goes below that so probably never used.

The problem is that the firewall rules are position based, tested from the top so the order of things matter. But when wgm deletes the rule, its not deleting a specific position so what if the wrong rule gets deleted. That would cause your issue.

There are no need for this rule at all in the config files. The firmware has this covered already. Perhaps if this config file was imported into something that was not a router.

Did you use the option 'full' when you created it?
{full} - If specified, the full set of rules are added as pre/post up/down to the remote config file (typically not needed if wgm is running on both sides)
If thats the reson for your troubles I should probably put in a warning in my guide not to use this when wgm is importing the server config.
 
Last edited:

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top