Sporadic GT-AXE11000 reboots due to kernel panic

Forsaken

Occasional Visitor
I bought GT-AXE11000 last spring and at some moment looking at the uptime noticed it gets rebooted from time to time once 3-4 weeks.
Nothing helpful were in native logs, so I did establish remote logging and was able to receive things like this each time it got rebooted:

Code:
2021-10-01T09:40:42-05:00 user 3 kernel: bcm63xx_nand ff801800.nand: timeout waiting for command 0x1
2021-10-01T09:40:42-05:00 user 3 kernel: bcm63xx_nand ff801800.nand: intfc status 700000e0
2021-10-01T09:40:43-05:00 user 4 kernel: BUG: failure at drivers/mtd/nand/brcmnand/brcmnand.c:1339/brcmnand_send_cmd()!
2021-10-01T09:40:43-05:00 user 0 kernel: Kernel panic - not syncing: BUG!
2021-10-01T09:40:43-05:00 user 4 kernel: CPU: 2 PID: 2111 Comm: conn_diag Tainted: P           O    4.1.52 #6
2021-10-01T09:40:43-05:00 user 4 kernel: Hardware name: Broadcom-v8A (DT)
2021-10-01T09:40:43-05:00 user 0 kernel: Call trace:
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000087398>] dump_backtrace+0x0/0x150
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc0000874fc>] show_stack+0x14/0x20
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00057bc70>] dump_stack+0x90/0xb0
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000579934>] panic+0xd8/0x220
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000344cac>] brcmnand_send_cmd+0x134/0x140
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000346180>] brcmnand_cmdfunc+0x128/0x2f0
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00033bbc8>] nand_check_wp+0x40/0x68
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00033ed24>] nand_do_write_ops+0xb4/0x3d8
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00033f1fc>] nand_write+0x5c/0x88
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc0003369c0>] part_write+0x20/0x28
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00033396c>] mtd_write+0x4c/0x68
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000207340>] __jffs2_flush_wbuf+0xb8/0xd18
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc0002083cc>] jffs2_flash_writev+0x1d4/0x478
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc0002086a4>] jffs2_flash_write+0x34/0x50
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00057aa54>] jffs2_garbage_collect_pristine+0x350/0x3bc
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00057afb0>] jffs2_garbage_collect_live+0x37c/0xec8
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000203ab8>] jffs2_garbage_collect_pass+0x408/0x830
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc000208094>] jffs2_flush_wbuf_gc+0xac/0x150
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc0001fc0ac>] jffs2_fsync+0x44/0x60
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00016b064>] vfs_fsync_range+0x44/0xc0
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00016b138>] do_fsync+0x38/0x68
2021-10-01T09:40:43-05:00 user 4 kernel: [<ffffffc00016b408>] SyS_fdatasync+0x10/0x20

At some point it got worse and once I got 4 reboots in 3 days, which was unacceptable for WFH. Then it was several-month journey w/ ASUS support which ended up them replacing my unit which happened right after Thanksgiving.
And guess what? It two weeks after I put it on duty - in less than 2 weeks I got same-caused reboot again :).
I looked through the reviews and saw a couple of messages like "great device, despite it looses connection sometime" - cannot be 100% sure, but pretty much reboot symptom.

So, as @L&LD keeps saying: beta-testing is in play.

Pretty much tired of arguing with support trying to persuade them anything at this point. The last resort I had - was scheduling everyday reboot, which seems increases the odds of router to "survive" to the end of the day.

Meanwhile it this thread, I see @RMerlin mentions he is now working more closely with ASUS dev team then ever before. So I wonder if it is possible to bring this issue to their attention.
Obviously, I understand - this still can be something they cannot address. Like Broadcom's bug or usage of the crap NAND in entire batch, but I feel it might worth trying anyways.

Thanks in advance.
 
Last edited:

strunker

Occasional Visitor
I am having the EXACT same problem, and I have been STRUGGLING with their crap support. They keep assigning different people to the case. This looks to be a driver related problem which can only really be fixed with new firmware. The latest firmware does NOT fix this. I got this unit 3 months ago, paid a premium for it, and I have been suffering with this problem now since I purchased. On at least the past 3 firmware revisions.

They want to replace, my nearly brand new unit, with a refurb which I explained isn't acceptable and also isn't going to fix anything anyway. To make things more annoying, the router doesn't always reliably reboot. Sometimes it will encounter this error, and then it will not reboot, so it sits there after the kernel panics and whatever watchdog process it has will not reboot the device. I left home on a trip, and out of no where lost connection when on vpn back to my home network, and it stayed down for over 24 hours until it randomly decided to reboot and I had connectivity again.

This router has been extremely unreliable, and for 600 dollars I am beyond disappointed. I rely heavily on my network and I dont really know what to do at this point.

I imagine you tried resetting already, and redoing literally all of your settings? Still same behavior? What firmware versions have you tried? Is this just luck and you and I both have units from a bad batch. I dont want to replace this with a refurb unit if the same thing is going to continue because whats the point then?

Code:
2022-02-06 09:00:18,User.Error,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: bcm63xx_nand ff801800.nand: timeout waiting for command 0x1
2022-02-06 09:00:18,User.Error,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: bcm63xx_nand ff801800.nand: intfc status 700000e0
2022-02-06 09:00:18,User.Warning,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: BUG: failure at drivers/mtd/nand/brcmnand/brcmnand.c:1339/brcmnand_send_cmd()!
2022-02-06 09:00:19,User.Emerg,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: Kernel panic - not syncing: BUG!
2022-02-06 09:00:19,User.Warning,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: CPU: 2 PID: 24207 Comm: TrafficAnalyzer Tainted: P           O    4.1.52 #2
2022-02-06 09:00:19,User.Warning,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: Hardware name: Broadcom-v8A (DT)
2022-02-06 09:00:19,User.Emerg,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: Call trace:
2022-02-06 09:00:19,User.Warning,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: [<ffffffc000087398>] dump_backtrace+0x0/0x150
2022-02-06 09:00:19,User.Warning,10.1.1.1,Feb  6 09:00:18 -25F7EE8-C kernel: [<ffffffc0000874fc>] show_stack+0x14/0x20
 

strunker

Occasional Visitor

It looks like this error is a linux system\driver error, taken from the above you can see the line generating the error below. How do we get proper eyes at Asus on this, because this is doesn't seem hardware related to me, it seems like a firmware\driver\linux issue that needs to be worked out. I wonder if the newer axe16000 will suffer from the same problems.

1644158082732.png
 

Attachments

  • 1644158069771.png
    1644158069771.png
    9.4 KB · Views: 76
  • 1644158455901.png
    1644158455901.png
    34.6 KB · Views: 81

Tech Junky

Very Senior Member
I don't use Asus but, multi-gig firmware right now is kind of half assed depending on the HW + Kernel being used.

For instance with my 8700K setup I could use kernel 5.14.rc7 but nothing beyond that and when I switched things up to 12700K 5.15.x works great but, trying to use 5.16.x doesn't work again.

Based on this I would say use a lower FW that works and test newer firmware when it comes out. There just seems to be a bug / mismatch in what's running on your devices.
 

strunker

Occasional Visitor
I wrote into the Linux foundation... Below is their response... Doesnt sound very promising. Any recommendations on how to get Asus to pay more attention to this issue?

1644167913109.png
 

Tech Junky

Very Senior Member
Open a case with Asus. If enough people open cases there's a chance they will upgrade the kernel in a FW release.

My issue though I opened a bug with the devs and they're looking into it and I'm not captive to using any particular FW / kernel being PC based which lets me workaround network issues like this.

There may be an openWRT solution though for the aged kernel on Asus.
 

strunker

Occasional Visitor
Open a case with Asus. If enough people open cases there's a chance they will upgrade the kernel in a FW release.

My issue though I opened a bug with the devs and they're looking into it and I'm not captive to using any particular FW / kernel being PC based which lets me workaround network issues like this.

There may be an openWRT solution though for the aged kernel on Asus.
Sadly, for this particular model, there doesn't seem to be any alternate firmware to try. I do have an open case though, I am going to call, not email or chat, but call and ask for a manager. Its been weeks of back and forth and I am no closer to a real solution.
 

unsynaps

Senior Member
I wonder if it is bad hardware. Been running mine for months and have had zero issues.
 

strunker

Occasional Visitor
I wonder if it is bad hardware. Been running mine for months and have had zero issues.
You may just not know... Do you have SNMP logging on to an external server?

Its just odd that we are both getting the exact same error, I would think if it was hardware that wouldnt be the case? Also on the replacement unit the exact same behavior started to take place. Wouldnt make sense if it was hardware specific, unless it was limited to a specific batch or time frame, idk...

All I can say is I spent 600 dollars on this thing, and I am not at all pleased. Somedays it reboots multiple times a day, or locks up, and the wireless radios completely drop SSID disappears, lan stops working... Just not what I would expect from a premium product.
 

Forsaken

Occasional Visitor
Hi strunker,

Seems we are in the same boat here :).

I am having the EXACT same problem, and I have been STRUGGLING with their crap support. They keep assigning different people to the case. This looks to be a driver related problem which can only really be fixed with new firmware. The latest firmware does NOT fix this. I got this unit 3 months ago, paid a premium for it, and I have been suffering with this problem now since I purchased. On at least the past 3 firmware revisions.

They want to replace, my nearly brand new unit, with a refurb which I explained isn't acceptable and also isn't going to fix anything anyway. To make things more annoying, the router doesn't always reliably reboot. Sometimes it will encounter this error, and then it will not reboot, so it sits there after the kernel panics and whatever watchdog process it has will not reboot the device. I left home on a trip, and out of no where lost connection when on vpn back to my home network, and it stayed down for over 24 hours until it randomly decided to reboot and I had connectivity again.
I noticed something similar on one of the old firmwares (42489-43986) once or twice, but it never happened with 45850 or newer. But it was before I started digging into the logs, so it might be different after all.
This router has been extremely unreliable, and for 600 dollars I am beyond disappointed. I rely heavily on my network and I dont really know what to do at this point.
Try to schedule everyday reboots during the night or when your are not using it. Not 100% solution, but should not hurt at least.
I imagine you tried resetting already, and redoing literally all of your settings? Still same behavior?
Did full reset several times while working with support before they finally agreed to replace my first unit.
What firmware versions have you tried?
All of them, except very first one (42026).
Is this just luck and you and I both have units from a bad batch.
It seems like the worst case, but least probable as well.
I dont want to replace this with a refurb unit if the same thing is going to continue because whats the point then?
Had to pay ~$30 to send my first unit to them. Got the same issue with replacement. Have no desire to do this again.
Ideally this needs to be brought to attention of the ASUS Dev team, but I'm just out of ideas of how to do this :(.
 

Forsaken

Occasional Visitor

It looks like this error is a linux system\driver error, taken from the above you can see the line generating the error below. How do we get proper eyes at Asus on this, because this is doesn't seem hardware related to me, it seems like a firmware\driver\linux issue that needs to be worked out. I wonder if the newer axe16000 will suffer from the same problems.

View attachment 39297
Ok, giving following trace we both have:

2021-10-01T09:40:42-05:00 user 3 kernel: bcm63xx_nand ff801800.nand: intfc status 700000e0


status = 0x700000e0 = 01110000000000000000000011100000

They are checking for the following: NAND_CTRL_RDY | NAND_STATUS_READY, which is (INTFC_CTLR_READY | INTFC_FLASH_READY | NAND_STATUS_READY) = BIT(31) | BIT(30) | 0x40

So seems like controller is not ready, while flash is ready and NAND status is 1 as expected.
 
Last edited:

strunker

Occasional Visitor
Ok, giving following trace we both have:

2021-10-01T09:40:42-05:00 user 3 kernel: bcm63xx_nand ff801800.nand: intfc status 700000e0


status = 0x700000e0 = 01110000000000000000000011100000

They are checking for the following: NAND_CTRL_RDY | NAND_STATUS_READY, which is (INTFC_CTLR_READY | INTFC_FLASH_READY | NAND_STATUS_READY) = BIT(31) | BIT(30) | 0x40

So seems like controller is not ready, while flash is ready and NAND status is 1 as expected.
The real problem here is Asus is using a super old Linux kernel. If you look at the actual response from the Linux foundation. I dont really see this issue getting resolved and I suspect there are likely a ton more people out there who are suffering from this problem and they likely just dont realize it because the reboots take place pretty quickly and you are back up and going in 30-60 seconds usually. Unless you run into the other manifestation of this where it never reboots.

And yeah I have the same exact status code. This issue was patched from what I found back in like 2017. So it doesn't seem like Asus has updated the Linux kernel in use on their routers since then which is why we are likely running into this.

I certainly wont be paying them 30 dollars to replace this unit that's for sure. Ultimately, I will just replace with something else and never use Asus for networking again. That will be my solution if they don't assist here.

Trying to hold out for the wrt firmware, would be super curious how well that works out, and how stable it is. Having an alternate firmware to test would, at minimum, allow us to prove whether this was in fact hardware related or not.

2022-02-07 12:10:20,User.Error,10.1.1.1,Feb 7 12:10:20 -25F7EE8-C kernel: bcm63xx_nand ff801800.nand: timeout waiting for command 0x1
2022-02-07 12:10:20,User.Error,10.1.1.1,Feb 7 12:10:20 -25F7EE8-C kernel: bcm63xx_nand ff801800.nand: intfc status 700000e0
2022-02-07 12:10:20,User.Warning,10.1.1.1,Feb 7 12:10:20 -25F7EE8-C kernel: BUG: failure at drivers/mtd/nand/brcmnand/brcmnand.c:1339/brcmnand_send_cmd()!
 

strunker

Occasional Visitor
Wanted to post back here. Support gave me a beta version (presumably unreleased) version of the firmware to try. The below is where my router is at. This is a very different version number than what is publicly available on the Asus site.

What I will say is so far its been 4 days and I have had zero kernel panics, or nand problems. Wifi also seems stable, I was getting weird packet loss spikes on the older firmware, and it was noticeable when gaming or streaming to Chromecast. Its working better now.

I dont really understand what they changed, and they didnt provide any detail, but it is working far better than I have seen it ever work in the past 3 months I have owned it. I am going to give it this entire week, if I do not get any kernel panics I will post back here, and ask when they plan to incorporate these fixes into the live versions of the firmware.

I dont feel great running on beta firmware forever? I mean it isnt practical because I cant stay on this version indefinitely and never take another security patch ever again, etc.

Current Version : 9.0.0.4.386_47871-gd872af0
 

strunker

Occasional Visitor
If it ain't broke, don't fix it as long as you disable WAN access it should be secure.
Ha, I guess. I would like to be in alignment with the rest of the general population however. Having to remain on a custom version to solve a problem, that clearly isnt specific to me seems odd.

I do take your point though, and yeah I don't allow wan access to console nor shell, lock down access to a specific internal IP, etc.
 

Forsaken

Occasional Visitor
Wanted to post back here. Support gave me a beta version (presumably unreleased) version of the firmware to try. The below is where my router is at. This is a very different version number than what is publicly available on the Asus site.

What I will say is so far its been 4 days and I have had zero kernel panics, or nand problems. Wifi also seems stable, I was getting weird packet loss spikes on the older firmware, and it was noticeable when gaming or streaming to Chromecast. Its working better now.

I dont really understand what they changed, and they didnt provide any detail, but it is working far better than I have seen it ever work in the past 3 months I have owned it. I am going to give it this entire week, if I do not get any kernel panics I will post back here, and ask when they plan to incorporate these fixes into the live versions of the firmware.

I dont feel great running on beta firmware forever? I mean it isnt practical because I cant stay on this version indefinitely and never take another security patch ever again, etc.

Current Version : 9.0.0.4.386_47871-gd872af0
My status as well: 40+ days w/o panic with scheduled reboots.
At the beginning of the February switched to rebooting three times a week (from previous "everyday"). Going to change to "two times a week" in March and then "once a week" in April, unless they release mentioned firmware earlier.

Please keep us posted with how it goes with your unit.
 

strunker

Occasional Visitor
My status as well: 40+ days w/o panic with scheduled reboots.
At the beginning of the February switched to rebooting three times a week (from previous "everyday"). Going to change to "two times a week" in March and then "once a week" in April, unless they release mentioned firmware earlier.

Please keep us posted with how it goes with your unit.
I tried that. I changed it to reboot every day, it had no positive impact unfortunately. Almost seemed to be the opposite, more reboots equaled more issues with my devices connecting back to the radios and didnt seem to help the panics at all.

This new firmware is legit though, thus far. It has been 5 days and I have yet to panic once which is pretty good as it was happening pretty frequently. If I get to the 1 week mark, ill be pleased, two week mark I will likely call the issue resolved and go back to support again to inform them.
 

Forsaken

Occasional Visitor
Couple updates from my side.

First, I got another Kernel panic last Monday ~14h after reboot, which is definitely bad news indicating that even everyday reboot will not guarantee 100% reliability.
Second, ASUS released another firmware, but this is rather neutral since there is nothing specific about this particular issue in their changelog.
And the last, but not least, @RMerlin released 386.5 firmware with added support for our model!

So I went ahead and checked out sources, searched for brcmnand.c files and found one with BUG_ON call in line 1339, which essentially triggers panic.
As I mentioned above, the reason for this is NAND controller reports itself as not ready 100ms after previous "read_page" operation, but I also found following commit in the link, provided by @strunker :

Decided to give it a try, spend couple of hours to setup wsl-based build environment and created firmware with above change applied.
Flashed it today, disabled reboot scheduler and crossed my fingers.

Obviously, it might not help at all, since if it really problem with NAND controller gets stuck - router will lose ability to read/write from/to NAND, but at least it should not panic/reboot right away anymore.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top