What's new

Solved GT-AX11000 Crashing randomly

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Jean42

New Around Here
I've recently bought a second hand GT-AX11000 which I upgraded to the last Merlin firmware as soon as I received it but I've noticed that it was rebooting randomly.
I then tested it with multiple firmwares (Merlin and Asus, including the last ones), always doing factory reset (usually through the web interface but I also tested with physical reset and it still crashes), I also formatted jffs partition.
I've tested multiple days with a basic config:
- No AI Mesh
- No QOS or parental control (and no Trend Micro agreement)
- No IPv6
- No SSH
- 3 different wifi names
- WAN configured with static IP address
- 0 devices connected through wifi or LAN (only my laptop very occasionally through wifi just to check on the logs)

I've attached an example of one of the crashlogs that I had.
I googled most of the errors I could find on the crashlogs (like "rcu_preempt detected stalls on CPUs/tasks"), but I didn't find anything useful.
Any idea how to fix this?
 

Attachments

  • syslog_example.txt
    115.1 KB · Views: 15
First two things I would check: is the power brick correct and correctly working; the condition of the MTD (perhaps an "mtdinfo" command, or some such, or I believe there will be "files" called "badblocks" or similar under either /proc or /sys). Problems with the first should be self-explanatory and problems with the second could explain misbehavior too. If both check out, then it could be other hardware issues.
 
I checked the whole syslog.log + mtdinfo command and checked under /proc and /sys, there does not seem to be any problem about bad blocks (but I'm not sure it's the best way to check though I didn't find any better way).
I checked the ouptut voltage of the power brick with a multimeter: 19.2V (stable even when moving the cable), but it was plugged on a cheap UPS so I plugged it elsewhere and I'm now monitoring for any new crash.
 
Sorry that I hadn't fetched your attachment until just now. Search for the line
Code:
May  5 07:05:05 crashlog: <5>ubi0: attaching mtd0

Within the lines following, you'll see a report of the layout which includes "good" "bad" and "corrupted" information.

So far, so good. I haven't perused the file further (yet) 'cause I'm on my phone in the garage enjoying a tobacco treat and the interface is a bit painful. Some time later this evening I'll fetch and peruse it on proper hardware.

What I'm thinking is (starts with): "second hand because why?" (I'm an habitual skeptic.) It's plausibly a faulty-hardware kind of scenario, given the source (they had incessant problems so off-loaded it?). In such a case it could really be anywhere inside the unit, but the low-hanging fruit is "power supply" and "failing / faulty memory chip" (not RAM, though that can't be ruled out), rather the non-volatile storage, part of which at least seems to "check out" so far...

I promise I'll look later: I've got a little script somewhere which peruses the (I think it was under "/sys" which isn't really a filesystem "proper" but represents the kernel's view of what's going on; "/proc" is similar) appropriate directory structure and pulls the pertinent info. I'll include it in a 'code' segment, from which you can copy and paste into an SSH terminal to obtain the report. If you get there first, search for a post by me which contains "badblocks", I'm sure it's already on-site somewhere. This will indicate any system-known problems with the NVRAM proper. As a starting point...
 
Okay, actually just whipped this up and it works on three different units in use here. Note that there may be some unobvious replication of data. Some of the MTD "things" are really not, but instead are "virtual" (alternate) representations of others. That's to say if you have two "devices" which claim, say, 3 bad blocks, it's likely the /same/ 3 bad blocks being reported twice, not a total of 6 on the chip.

The reason I even bring this up is, although it's likely, if not normal, to have /some/ bad "blocks" on any given manufactured chip, if the number is actively on the rise the chip ain't got long to live.

Either copy / paste this into a file in /tmp (/tmp/badblocks) and "chmod +x" the file afterward, then execute ./tmp/badblocks , or merely copy / paste it into an SSH command line and hit the "enter" key. Scroll up to see all the output.

Bash:
#!/bin/sh

for i in $(cd /dev; ls mtd[0-9]* | sed '/ro$/d; s/mtd//')
    do
        sed -n '1p; /^mtd'$i':/p' /proc/mtd
        echo -n /sys/class/mtd/mtd${i}/bad_blocks says:\
        cat /sys/class/mtd/mtd${i}/bad_blocks
        echo
done
 
What I'm thinking is (starts with): "second hand because why?" (I'm an habitual skeptic.) It's plausibly a faulty-hardware kind of scenario, given the source (they had incessant problems so off-loaded it?). In such a case it could really be anywhere inside the unit, but the low-hanging fruit is "power supply" and "failing / faulty memory chip" (not RAM, though that can't be ruled out), rather the non-volatile storage, part of which at least seems to "check out" so far...

Oh, I'm almost sure it's been sold because it was faulty, the question now is can it be fixed?

I've run your script (I just removed the "\" at the end of this line "echo -n /sys/class/mtd/mtd${i}/bad_blocks says:\" ) and I have 0 bad block in all devices:
Code:
dev:    size   erasesize  name
mtd0: 051c0000 00020000 "rootfs"
/sys/class/mtd/mtd0/bad_blocks says:0

dev:    size   erasesize  name
mtd1: 051c0000 00020000 "rootfs_update"
/sys/class/mtd/mtd1/bad_blocks says:0

dev:    size   erasesize  name
mtd10: 00800000 00020000 "misc1"
/sys/class/mtd/mtd10/bad_blocks says:0

dev:    size   erasesize  name
mtd11: 04d23000 0001f000 "rootfs_ubifs"
/sys/class/mtd/mtd11/bad_blocks says:0

dev:    size   erasesize  name
mtd2: 00800000 00020000 "data"
/sys/class/mtd/mtd2/bad_blocks says:0

dev:    size   erasesize  name
mtd3: 00100000 00020000 "nvram"
/sys/class/mtd/mtd3/bad_blocks says:0

dev:    size   erasesize  name
mtd4: 05700000 00020000 "image_update"
/sys/class/mtd/mtd4/bad_blocks says:0

dev:    size   erasesize  name
mtd5: 05700000 00020000 "image"
/sys/class/mtd/mtd5/bad_blocks says:0

dev:    size   erasesize  name
mtd6: 00520000 00020000 "bootfs"
/sys/class/mtd/mtd6/bad_blocks says:0

dev:    size   erasesize  name
mtd7: 00520000 00020000 "bootfs_update"
/sys/class/mtd/mtd7/bad_blocks says:0

dev:    size   erasesize  name
mtd8: 00100000 00020000 "misc3"
/sys/class/mtd/mtd8/bad_blocks says:0

dev:    size   erasesize  name
mtd9: 03f00000 00020000 "misc2"
/sys/class/mtd/mtd9/bad_blocks says:0

I also had an other crash even if plugged elsewhere without the UPS, so the UPS is not the cause.
Since the voltage of the power supply is fine, I don't see how I can test if it's faulty or not apart from changing it.
 
On any Asus router, I buy or set up for customers (or myself), new or used, I perform the following:



And for a more in-depth step-by-step guide, the following link may help.

Nuclear Reset https://www.snbforums.com/threads/major-issues-w-rt-ac86u.56342/page-4#post-495710

(From the link https://www.snbforums.com/members/l-ld.24423/#about).


After performing the steps in the link above, and the router is still not in a good/known state, it is more than likely a hardware fault (and a new router is indicated).
 
Since the voltage of the power supply is fine, I don't see how I can test if it's faulty or not apart from changing it.
If you haven't yet "nuked" it as per L&LD's suggestion, by all means do so.

Measuring the unloaded voltage output of the power brick only tells you what that is. Use Ohm's law to determine what value resistor you'd need to put across the terminals along with your volt meter to see if the voltage holds when providing rated power. Or substitute in an equivalent brick if available.
 
Use Ohm's law to determine what value resistor you'd need to put across the terminals along with your volt meter to see if the voltage holds when providing rated power.

Do you know how big 12Ohm 30W resistor is?

1710380405033.png


I don't have one in my electronics lab for example.

Common 0.125W-0.250W resistor will glow red seconds after the power supply is plugged in.
 
Last edited:
I've bought a 120W universal AC-DC converter (and yes, I should be able to set it to 19V), it should be good enough to monitor it for a few days (I didn't want to buy the asus converter since it's rather expensive and the universal converter could be handy in the future, if it shows that the power supply was faulty, I'll then buy the asus power supply).

If it still doesn't work then I'll go nuclear ^^
 
if you've ever heard of incandescent light bulbs

I don't have any 20V 30W-36W incandescent light bulbs either. Ohms law, rated power... your own advice, remember?
 
Got a kilowatt lightbulb handy? Or some you can series-connect? I'm sure I could whip that up with what I've got "in stock". Anyway, as usual (and expected!) your argument is both pendantic and appreciated.
 
Load that brick and check its output if you want to eliminate it from the suspect pool. Personally, I only ever expend that kind of energy or effort if it seems even remotely plausible to be of value. I'd /most/ likely not peruse the second-hand market in the first place.

Although it's entirely plausible the original owner never registered the device so it'd be yet "under warranty."
 
Do you understand the impracticality of your advice involving potential dangers of burning someone's hands?

This is what is going to happen when commonly found in electronics low ohm "resistor" is connected to 20V high current power source:

1710448544270.png
 
But then again, I've never been told "you have too much knowledge and experience" for a "shift electrician" job applied-for, until quite recently... So perhaps I am "missing something".
 
I am "missing something".

This is correct. The proposed method of testing switching power supplies is not practical nor recommended.
 
Last edited:
Wait a minute. Loading a power supply and checking its capability is neither practical nor recommended, by you? A little respect has indeed just been lost.
 
This conversation is over, @glens.
 
Top