Sporadic GT-AXE11000 reboots due to kernel panic

alan6854321 · Dec 27, 2023

Well, that's a nasty Xmas present!
Just had a"Kernel panic - not syncing: BUG!" crash.
The first for many months, and the first ever on 388.5

Hypothosis · Jan 2, 2024

Code:

May  5 01:05:10 crashlog: <4>BUG: failure at kernel/irq_work.c:141/irq_work_run_list()!
May  5 01:05:10 crashlog: <0>Kernel panic - not syncing: BUG!

Same here, i don't get why they won't fix such a core issue affecting the premium models. I really cannot have this router reboot randomly, unpredictably. I'm ready to give up on ASUS and go to a simpler Switches & PoE AP Setup using Netgear. It's been many months, maybe years? I don't remember even.

I have a GT-AXE11000 and a GT-AX11000 (without the E) and they have the same (latest) Merlin firmwares and virtually identical configuration. The reboots have been plaguing me on the AXE for months, every Firmware. I have since disabled everything, QOS, classification, roaming assistant, TM IPS. Now I feel like i've been getting less frequent reboots I believe, but it still occasionally happens. I turned off reboot scheduler to see how long it will go.

Interesting detail, why does AXE shows Protected Management Frames options, while the AX does not - they're all identically configured with WPA2/WPA3.

On the AXE i have 2.4/5 as capable, and i had 6GHz as required.

I turned 6GHz off now, to see if this or the protected management frames cause this, as someone had suggested a few posts ago.

Hypothosis · Jan 8, 2024

Hypothosis said:
Code:

May 5 01:05:10 crashlog: <4>BUG: failure at kernel/irq_work.c:141/irq_work_run_list()! May 5 01:05:10 crashlog: <0>Kernel panic - not syncing: BUG!

Same here, i don't get why they won't fix such a core issue affecting the premium models. I really cannot have this router reboot randomly, unpredictably. I'm ready to give up on ASUS and go to a simpler Switches & PoE AP Setup using Netgear. It's been many months, maybe years? I don't remember even.

I have a GT-AXE11000 and a GT-AX11000 (without the E) and they have the same (latest) Merlin firmwares and virtually identical configuration. The reboots have been plaguing me on the AXE for months, every Firmware. I have since disabled everything, QOS, classification, roaming assistant, TM IPS. Now I feel like i've been getting less frequent reboots I believe, but it still occasionally happens. I turned off reboot scheduler to see how long it will go.

Interesting detail, why does AXE shows Protected Management Frames options, while the AX does not - they're all identically configured with WPA2/WPA3.

On the AXE i have 2.4/5 as capable, and i had 6GHz as required.

I turned 6GHz off now, to see if this or the protected management frames cause this, as someone had suggested a few posts ago.

Now with 6Gz turned off, i got 7-8d of uptime before the same crash. Very disappointing, still not resolved.

Done - no effect, or only extension of uptime

Disabled IPS - DONE, didn't resolve
Disabled all QoU / Traffic Analyzer - DONE, didn't resolve
Disabled 6Ghz - DONE, didn't resolve

Still left to do on the AXE11000 (which is crashing)

Turned off WPA3 - Disabled Protected Frames
Disabled All IP6 (anywhere I could find it) - apparently this helps with Linux issues.
Changed NAT Back From 'Full Cone' to 'Symmetric'
TCP Adjustments (TCP Connections Limit to 100000, halfed all other settings) - Helps with Kernel Panic?
Turning off WIFI6 (160Hz) majorly helped me

Weirdly, the AX with pretty much identical settings doesn't crash (AX is in AP mode though, AXE is in full router mode). Makes me believe it wouldn't be associated with WPA3 and 160MHz. But I also have different devices talking to the AX and AXE, so they can't be ruled out?

It'd be glad to for any pointers, things that have worked or systems like the AXE16 maybe that are 100% not affected.

Hypothosis · Jan 8, 2024

Hypothosis said:
Now with 6Gz turned off, i got 7-8d of uptime before the same crash. Very disappointing, still not resolved.

Done - no effect, or only extension of uptime

Disabled IPS - DONE, didn't resolve

Disabled all QoU / Traffic Analyzer - DONE, didn't resolve

Disabled 6Ghz - DONE, didn't resolve

Still left to do on the AXE11000 (which is crashing)

Turned off WPA3 - Disabled Protected Frames

Disabled All IP6 (anywhere I could find it) - apparently this helps with Linux issues.

Changed NAT Back From 'Full Cone' to 'Symmetric'

TCP Adjustments (TCP Connections Limit to 100000, halfed all other settings) - Helps with Kernel Panic?

Turning off WIFI6 (160Hz) majorly helped me

Weirdly, the AX with pretty much identical settings doesn't crash (AX is in AP mode though, AXE is in full router mode). Makes me believe it wouldn't be associated with WPA3 and 160MHz. But I also have different devices talking to the AX and AXE, so they can't be ruled out?

It'd be glad to for any pointers, things that have worked or systems like the AXE16 maybe that are 100% not affected.

I want to add that NAT is actually Symmetric already, both flow cache and runner are on, there's no swap file configured. I also DO use UPNP (secure mode) ... children, xbox on the network. I remember vaguely that @RMerlin ages ago said that UPNP isn't good, could that be a source of this kernel synching bug?

Maybe, to further test i can set up some mass static port forwards from public AXE to my AX (AP, only private IP) from the AXE (router, public IP), and enable UPNP on the AX only (private IP/port on AX to statically forwarded ports on AXE) and run it in full router mode also, downstream to the other router - but that would give me other issues of operating everything in one network probably. I could also make the switch getting an IP (AXE is getting the public IP atm) and then have both routers downstream. I am not sure whether.

I use UPNP for the XBOX but also Chromecast discovery on the whole network, and there may be other casting uses I am unaware of. So I can't just switch it off. But I could try transferring responsibility to the AX and see if the AXE still crashes. Kinda lost here...

Hypothosis · Jan 8, 2024

Based on the log file below and "BUG: failure at kernel/irq_work.c:141/irq_work_run_list()"
i checked the code in the AXE branch https://github.com/RMerl/asuswrt-me...-5.02axhnd/kernel/linux-4.1/kernel/irq_work.c

@RMerlin in newer kernels looks like this "BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT));"
It seems to me that CONFIG_PREEMPT_RT is not set in the Makefile on first sight, making the right part always TRUE.

So naively something turned on interrupts, when it should be off? Is there any way to see what did?
OR would enabling CONFIG_PREEMPT_RT fix the problem? I assume it has further reaching implications... based on: "The PREEMPT_RT patch has been partially merged into the mainline Linux kernel, starting from version 5.15"

I either need to find the source of the problem, or find a new device that doesn't have the problem (different broadcom chip?).

Code:

May  5 01:05:10 crashlog: <4>BUG: failure at kernel/irq_work.c:141/irq_work_run_list()!
May  5 01:05:10 crashlog: <0>Kernel panic - not syncing: BUG!
May  5 01:05:10 crashlog: <4>CPU: 2 PID: 0 Comm: swapper/2 Tainted: P           O    4.1.52 #2
May  5 01:05:10 crashlog: <4>Hardware name: Broadcom-v8A (DT)
May  5 01:05:10 crashlog: <0>Call trace:
May  5 01:05:10 crashlog: <4>[<ffffffc000087398>] dump_backtrace+0x0/0x150
May  5 01:05:10 crashlog: <4>[<ffffffc0000874fc>] show_stack+0x14/0x20
May  5 01:05:10 crashlog: <4>[<ffffffc00055c068>] dump_stack+0x90/0xb0
May  5 01:05:10 crashlog: <4>[<ffffffc000559d2c>] panic+0xd8/0x220
May  5 01:05:10 crashlog: <4>[<ffffffc0000f8670>] irq_work_run+0x0/0x48
May  5 01:05:10 crashlog: <4>[<ffffffc0000f8888>] irq_work_tick+0x48/0x68
May  5 01:05:10 crashlog: <4>[<ffffffc0000db554>] update_process_times+0x54/0x70
May  5 01:05:10 crashlog: <4>[<ffffffc0000e9da8>] tick_sched_handle.isra.6+0x28/0x80
May  5 01:05:10 crashlog: <4>[<ffffffc0000e9e44>] tick_sched_timer+0x44/0x90
May  5 01:05:10 crashlog: <4>[<ffffffc0000dbf94>] __run_hrtimer.isra.4+0x4c/0x110
May  5 01:05:10 crashlog: <4>[<ffffffc0000dc37c>] hrtimer_interrupt+0xdc/0x298
May  5 01:05:10 crashlog: <4>[<ffffffc000383998>] arch_timer_handler_phys+0x30/0x40
May  5 01:05:10 crashlog: <4>[<ffffffc0000d1120>] handle_percpu_devid_irq+0x78/0xa0
May  5 01:05:10 crashlog: <4>[<ffffffc0000ccc24>] generic_handle_irq+0x34/0x50
May  5 01:05:10 crashlog: <4>[<ffffffc0000ccf34>] __handle_domain_irq+0x5c/0xb8
May  5 01:05:10 crashlog: <4>[<ffffffc000080c18>] gic_handle_irq+0x38/0x90
May  5 01:05:10 crashlog: <4>Exception stack(0xffffffc03e8cfdc0 to 0xffffffc03e8cfef0)
May  5 01:05:10 crashlog: <4>fdc0: 421c0f78 00002bda 00000000 00000080 3e8cff10 ffffffc0 00381494 ffffffc0
May  5 01:05:10 crashlog: <4>fde0: 421c0f78 00002bda 2c5b40a2 00228f8b 0000d8c7 00000000 14000963 00000000
May  5 01:05:10 crashlog: <4>fe00: 04fc49e8 00000000 00000018 00000000 98686cad 00227e9b 2f6074a8 ffffffc0
May  5 01:05:10 crashlog: <4>fe20: 3e8c3a70 ffffffc0 3e8cfec0 ffffffc0 00000003 00000000 ffeb472c 00000000
May  5 01:05:10 crashlog: <4>fe40: 00000000 00000000 ffeb3cb0 00000000 00013384 00000000 00000000 00000000
May  5 01:05:10 crashlog: <4>fe60: 00424138 ffffffc0 00000000 00000000 00000000 00000000 421c0f78 00002bda
May  5 01:05:10 crashlog: <4>fe80: 3ffd4338 ffffffc0 00000001 00000000 00000001 00000000 4216d11d 00002bda
May  5 01:05:10 crashlog: <4>fea0: 3e8cc000 ffffffc0 0091d000 ffffffc0 0074b000 ffffffc0 3ffd4338 ffffffc0
May  5 01:05:10 crashlog: <4>fec0: 0078d580 ffffffc0 3e8cff10 ffffffc0 0038148c ffffffc0 3e8cff10 ffffffc0
May  5 01:05:10 crashlog: <4>fee0: 00381494 ffffffc0 60000145 00000000
May  5 01:05:10 crashlog: <4>[<ffffffc000083f00>] el1_irq+0x80/0xf8
May  5 01:05:10 crashlog: <4>[<ffffffc000381598>] cpuidle_enter+0x18/0x20
May  5 01:05:10 crashlog: <4>[<ffffffc0000c5d54>] cpu_startup_entry+0x1ec/0x250
May  5 01:05:10 crashlog: <4>[<ffffffc00008d190>] secondary_start_kernel+0x150/0x178

Hypothosis · Jan 8, 2024

this RT patch is a big patch, i'm definitely out of my comfort zone here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e5e726f7bb9f

Hypothosis · Jan 12, 2024

just another crash, this time only 1d uptime :-(
Tried so far:

Disabled IPS - DONE, didn't resolve
Disabled all QoU / Traffic Analyzer / Roaming Assistant - DONE, didn't resolve
Disabled 6Ghz - DONE, didn't resolve
Turned off WPA3 - Disabled Protected Frames - DONE, didn't resolve
Disabled All IP6 - DONE, has been off for a long time
NAT on 'Symmetric' - DONE, has been off for a long time
TCP Adjustments - changed from 300k, 250k, didn't do anything, also happened during time of no load
Turning off WIFI6 (160Hz) - DONE, didn't resolve

I just changed WPA3 top Open on the disabled 6Ghz, you never know. Also disabled AX mode on 2.4GHz, actually will deactivate 2.4Ghz altogether, only leaving 5Ghz.
Should that crash again:

1. Deactivating UPNP
2. Deactivating flow cache

lol not much left to disabled, deactivating all Wifi. Really no idea how to troubleshoot this.

The last log entry before the most recent reset was a UPNP remove, 4 hours earlier, so unlikely to be connected.

Hypothosis · Jan 12, 2024

i'm giving up on this, orderin ax88u pro or gt-ax6000 now... bye bye 6hz

Hypothosis · Jan 18, 2024

After another crash, Documenting this for posterity. Uptime 8hrs atm.

Turned of UPNP also > no effect
Fixed WLAN channel > nope (hoping this would turn off DFS, but apparently 160Mhz has to go too)

Now really don't have much to turn off, so these are the final changes.

Turned Traditional QoS ON (is there another way to turn off HW acceleration?)
80Mhz to turn off DFS for good

I may never find out how to fix this, as the praised AX6000 is arriving today and this AXE11000 will have to be decomissioned.

charlie2alpha · Jan 19, 2024

Got this router a month ago. Currently running the latest beta of Merlin's firmware, uptime is 13 days. Never seen a crash so far.

Hypothosis · Jan 19, 2024

Maybe it is defective somehow, my ax (without e) 11000 has an uptime of many months, but as that exact bug has been posted over and over, over the years, I thought I might be able to get to the root of the problem... someone suggested its due to an old Linux kernel, but asus didn't accept the pull request as it would only hide the problem. No idea, I am giving up. Expensive piece of furniture now.

Hypothosis · Jan 21, 2024

well, this is funny, first time i'm getting 2d+ uptime on the AXE11000, but now I have the AX6000 also. now I'll maybe just chase the problem all the way down. only 2 changes i made, turn on trad QoS (to turnoff HW acceleration) and disabled DFS/auto channels.

jerry6 · Jan 26, 2024

strange my AXE100 has run problem free ,, have been running Merlin FW on the router since Merlin was available . Lucky I guess

alan6854321 · Jan 26, 2024

Had another "Kernel panic - not syncing: BUG!" on my AX86S last night.
The second I've had since installing 388.6
Any more and I might go back to 388.5 - Didn't get a single one on that release.

Hypothosis · Jan 26, 2024

jerry6 said:
strange my AXE100 has run problem free ,, have been running Merlin FW on the router since Merlin was available . Lucky I guess

do you have devices connecting to 6GHz?

Hypothosis · Jan 26, 2024

alan6854321 said:
Had another "Kernel panic - not syncing: BUG!" on my AX86S last night.
The second I've had since installing 388.6
Any more and I might go back to 388.5 - Didn't get a single one on that release.

I got a few days of uptime with almost everything turned off. Last thing I did was turning on 6Ghz again and now off again, as i got another panic. When I deactivated 2.4 and 6, and deactivated channel selection for 5Ghz, i got the longest uptime. Also I had turned off hardware acceleration and pretty much everything else.

Maybe, if there's only one Wireless band on, the problem doesn't arise. Because there's no need for concurrency. I am fine with just 5Ghz being active, not really what I bought the machine for. Should I get it stable again (>5d uptime), I'll clear the NVRAM and JFFS and then start activating features 1:1 again. Kind of like an elimination diet ;-)

I don't know how concurrency is implemented with respect to Wireless handling, it's probably closed source, cannot speculate more.

jerry6 · Jan 27, 2024

Hypothosis said:
do you have devices connecting to 6GHz?

no 6 gh was good for 10 feet no wall , in other words usless

Sporadic GT-AXE11000 reboots due to kernel panic

Senior Member

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Regular Contributor

Senior Member

Regular Contributor

Regular Contributor

Very Senior Member

Senior Member

Regular Contributor

Regular Contributor

Very Senior Member

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest