What's new

RBK50/RBS50 Daily Disconnects

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Swizzy

New Around Here
This is a lengthy post so I can tell you all of the steps I've taken so far to try and pin this down. I own one RBK50 in AP mode and two RBS50 satellites, one with ethernet backhaul and the other 5G backhaul. I have been experiencing usually at least one outage every 24h, time of day is random. It will start with devices still being connected but without internet access and if I restart the wifi/ethernet adapter on any device it will reconnect but not receive IP configuration from DHCP. Hostnames outside or on the local network can't be resolved anymore, but I can ping and access local devices using their IP (if they still have one). I can grab my laptop, assign a static IP and DNS server(either wifi or ethernet) and it will actually work. Meanwhile the satellites usually show a purple ring together with the rbk50.
My DHCP and DNS server is an instance of Pihole running in a Debian jail on my Freenas server. I thought it was this at first but I've had Pihole running on another server and on a raspberrypi for years without issue until the Orbis were introduced.
What I did notice once an outage starts is that my freenas server which is connected to the RBK50 via ethernet starts spamming ethernet up/down messages in the logs, similar messages to what you get from constantly disconnecting/reconnecting the cable, or disabling/enabling the adapter and to show this isn't limited to WiFi.
The only way to fix it is to press the on/off button on the RBK50, sometimes 2-3 times in a row as it will come back up, lights go blue, internet works and then 30s later it dies again.

I've only recently discovered that the RBK50 has a debug menu. I've spent an hour last night going through the logs and found a couple of things that don't seem right, at least one of which happened during an outage. I had a browser tab with the rbk50 debug page and managed to grab the log files before the connection fully went.

I've had this issue since I think the official 2.6.1.40 firmware all the way to 2.7.2.102. For the past 2 or 3 months I've been using Voxels firmware, which unfortunately did not solve the problem, although it is better when it works.

wireless-log1.txt shows these messages, almost every minute. Now if I recall correctly the higher channels are for the hidden backhaul network right?

@ @ 23.12.11.351292 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.12.352566 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.14.158770 HYDR steermsg info : steermsgRxLoadBalancingComplete: Received load balancing complete from 9C:3D:CF:F8:52:CD, transaction ID [59] steering attempted [0] (mid [48469])
23.12.14.159385 HYDR bandmon info : bandmonMBSAHandleLoadBalancingCompleteEvent: 9C:3D:CF:F8:52:CD did not perform any active steering
23.12.14.493842 HYDR csh ERR : New shell session (3/5) using sd 40
23.12.14.653669 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to INACTIVE APId 255 ChanId 8 ESSId 0
23.12.14.806459 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: 8E:85:80:01:F7:56 activity status changes to ACTIVE APId 255 ChanId 8 ESSId 0
23.12.15.354934 HYDR bandmon ERR : bandmonMBSAHandleRawMediumUtilizationUpdateEvent: Failed to resolve channel information for channel 157
23.12.15.827807 HYDR wlanif debug: wlanifBSteerEventsHandleActivityChange: B8:8A:EC:22:93:E5 activity status changes to ACTIVE APId 255 ChanId 40 ESSId 0


A more interesting error is one that happened at the time of the outage, again in wireless-log1.txt

23.17.15.999483 HYDR wlanManager ERR : wlanManager_isAP: ioctl() failed, ifName: eth1.

This gets repeated a LOT in that particular log file. Now I'm not sure how the router counts its ethernet ports. I can't assume eth1 is port1 as it might internally start with eth0. The only time I've seen ioctl messages is when I've had harddrives/usb sticks go bad and they stop responding. I really hope this isn't a hardware problem.

Googling bandmonMBSAHandleRawMediumUtilizationUpdateEvent gets me two results, both on Netgear forums but relating to the RBR40. However both of these posts go unaddressed.

Hope someone can be bothered to read it. It's becoming more of a hobby/obsession of mine to figure out what is going on. Over the months I've tried every combination of settings for channels, daisychaning, beamforming etc, the problem seems to be deeper than just a surface level configuration issue.
 
What happens if you take Pihole out of the picture and just let the RBK50 handle DNS and DHCP?
 
Swizzy - I assume that you have tried ALL the obvious? (factory resets on router and satellites?, disabling MIMO, Daisy Chain and Fast Roaming etc..) Yes from what I remember from the Orbi forums back in the day that a couple people had the same problems and never really found a solution to this same problem. I would have to assume its some sort of hardware or maybe even a overheat issue? I am just guessing. Also yes channel 157 is the backhaul channel I believe.

1)I would start with resets on ALL the hardware and maybe try 1 satellite at a time to see if it reproduces it... then try the 2nd satellite to see if that one causes the outage.

2)Another thing I would try is just buying a used RBR50 from eBay for $50-$60 and see if that resolves your issues. If not you can resell for about the same price or just convert it to a satellite pretty easily..
 
What happens if you take Pihole out of the picture and just let the RBK50 handle DNS and DHCP?
The same thing but now I get ads. Like I said, I spent days checking pihole (and spinning up more pihole instances just in case) and it never made a difference. The pihole never shows any error messages, neither DNS or DHCP related.

Swizzy - I assume that you have tried ALL the obvious? (factory resets on router and satellites?, disabling MIMO, Daisy Chain and Fast Roaming etc..) Yes from what I remember from the Orbi forums back in the day that a couple people had the same problems and never really found a solution to this same problem. I would have to assume its some sort of hardware or maybe even a overheat issue? I am just guessing. Also yes channel 157 is the backhaul channel I believe.

1)I would start with resets on ALL the hardware and maybe try 1 satellite at a time to see if it reproduces it... then try the 2nd satellite to see if that one causes the outage.

2)Another thing I would try is just buying a used RBR50 from eBay for $50-$60 and see if that resolves your issues. If not you can resell for about the same price or just convert it to a satellite pretty easily..
I have certainly tried everything you already suggested. I tried factory resetting multiple times both with the original firmware and Voxels too but sooner or later it comes back. I have also tried any and all combinations of channels and MIMO, daisy chain and fast roaming settings.

Oddly enough the 5G backhaul DOES work, it's just strange that the router is CONSTANTLY spamming the logs about it, and about eth1 port.

I don't think I mentioned it in the opening post but I did remove both satellites and the servers and it still happens. Only thing connected at that point where two chromecasts and two Samsung phones which are 2.4/5ghz capable.

I promised my wife that if I don't find a solution or a 'lead' I can explore over the next few days that I will get rid of the Orbis, as much as it pains me because I spent a LOT of time on this setup. She works from home during the day so I have been pre-emptively restarting the Orbis either 6AM or before I go to bed to reduce down time during the day.
 
So the root problem is that the router (RBK50) is disconnecting from the internet?

But then you say "5G backhaul DOES work". I don't follow.

If Wi-Fi backhaul is working, I wouldn't worry about the logs. Who knows what is needed to monitor/maintain the backhaul connection?
 
OP: When you use your wired backhaul, you are using Port 1 on the Router to Port 1 on the Satellite - correct?

 
Hope someone can be bothered to read it. It's becoming more of a hobby/obsession of mine to figure out what is going on. Over the months I've tried every combination of settings for channels, daisychaning, beamforming etc, the problem seems to be deeper than just a surface level configuration issue.

If I read your story, then I'd say the root cause is that your freenas device looses lan connection.
With the freenas down, it is logical that internet on all your other devices seem down, because their DNS server (Pihole) is not reachable.
And if you restart a device, it will not get any IP because the DHCP server (also Pihole) is not reachable.

And you still being able to ping devices on their IP-address also indicates that the connections to the satellites is still working fine.
Just like when you assign a static IP and DNS server (presumably a different DNS server than your Pihole), that it then does work.

Do you have more devices wired to RBK50? And do they also loose their ethernet connection?
Can you perhaps wire your laptop directly to to the RBK50 to check?
Is your router also wired to RBK50? Does that connection still remains up during the issue?
(i.e. from your router, can you still connect to the RBK50?)
What if you would connect the freenas directly to one of the LAN ports of your router?
(i.e. to rule out incompatibly between freenas NIC and RBK50 NIC)

in any case, I'd suggest to in future do more specific tests. (or perhaps you did, but didn't mention it)
I.e. instead of concluding that internet is down, try to pinpoint exactly where the issue lies:
- ping the DNS and DHCP server(s)
- ping RBK50 / satellites
- ping default gateway (router IP)
- ping an internet IP (like 1.1.1.1 or 8.8.8.8)
- try an nslookup

(and do these tests from a device that has a static IP-address configured, because if you use a DHCP device that already lost it's IP, then all tests will fail.)
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top