Restoring R7800 netgear flash partition

xyzzy

Occasional Visitor
My R7800 got a bad bit in one of the NAND flash sectors. It's in the netgear mtd partition, the one that has the UBI volume in it. Because of this, the overlay ubifs filesystem doesn't mount and it's pretty much totally broken. But at least the Voxel firmware lets me log in with telnet so I should be able to fix it.

While ECC is supposed to fix bad bits and UBI is supposed to deal with bad blocks, two details make this not work.

I should be able to fix it by using ubiformat to reformat the partition and then reload the various volumes that were on it. But where to get the original contents of those volumes?

I see references that say to truly factory restore the firmware, one should flash erase that partition. So there must be something to restore the default contents or that would break the device. Note that one shouldn't flash erase it, as that destroys the UBI wear-leveling information. Better to ubiformat which will erase the partition but restore the wear-level information. But restoring the contents is the same issue either way.

There are two reasons this one bit error breaks UBI.

ECC should fix all one bit errors, but this bit in in an erased blocked. Erased blocks are all 1s. That's not valid ECC information. A NAND page can only be written once before it needs to be erased, so it's not possible to write valid ECC data to an erased page as then it couldn't be used for real data, having been written once. So erased pages have no ECC. Which should be fine, as there is no data (erased!) on them anyway to protect. But it does mean that a one bit error on an erased page shows up as a bad page, rather than being automatically fixed.

UBI is supposed to be able to deal with bad blocks. But it does this by checking if there are bad after it erases them. If a freshly erased block isn't bad, it's put in the blank block pool and expected to stay blank. UBI can't cope with a blank block going bad on its own without getting used, which is apparently what happened here.
 

xyzzy

Occasional Visitor
Yes, but what rebuilds it?

I think mtd erase will probably clear the wear leveling information. Unless there is a copy of ubiformat in the firmware somewhere I couldn't find it.

Unfortunately, Voxel's firmware broke. telnet access is disabled because ssh should be enabled. But ssh won't let me login because of no ssh key. Can't create an ssh key because the overlay is broken and that's the place it gets stored. So while the web interface mostly works, I can't log in to run any commands.

If I can ever get Voxel's firmware to build then I should be able to tftp load a fixed version that doesn't turn of telnet when ssh doesn't really work. I get the impression no one ever builds the firmware themselves, because I've had to work through tons of errors and it's still not done.
 

fossil

Occasional Visitor
Unfortunately, Voxel's firmware broke. telnet access is disabled because ssh should be enabled. But ssh won't let me login because of no ssh key. Can't create an ssh key because the overlay is broken and that's the place it gets stored. So while the web interface mostly works, I can't log in to run any commands.
Both ssh and telnet can be enabled at the same time. Just open the below page and Check "Enable Telnet"

I get the impression no one ever builds the firmware themselves, because I've had to work through tons of errors and it's still not done.
It is buildable, requires some effort.
 

xyzzy

Occasional Visitor
Thanks, I didn't think to re-try the debug control page. It broke with the netgear firmware when the flash went bad.

The last piece I needed to build the Voxel firmware was the wifi FW. NSS.AK.1.0.c8-00015.tar.bz2 is no longer anywhere to download and isn't distributed with the netgear open source release. In fact the qca-nss-fw package it's in isn't even part of the netgear firmware. I found some FW on https://github.com/qca/nss-firmware but they are different version, NSS.AK.1.0.c5-00004, so probably won't work.

But at least it's built. make menuconfig is still broken.
 

xyzzy

Occasional Visitor
And success, flash recovered.

This is what I did to reformat UBI while preserving the wear-leveling information and also restoring the contents.
Bash:
/etc/init.d/traffic_meter stop   # this uses one of the UBI volumes

# Back up everything, except overlay_volume, to ram disk
for i in 14 15 16 17 18 20 21; do
    cat /dev/mtd$i > /tmp/mtd$i.img
done

# Reformat
ubidetach -d 0
ubiformat /dev/mtd7

# Re-create original volumes and sizes
ubimkvol /dev/ubi0 -n 0 -N cert -S 1
ubimkvol /dev/ubi0 -n 1 -N pot.bak -S 3
ubimkvol /dev/ubi0 -n 2 -N traffic_meter -S 14
ubimkvol /dev/ubi0 -n 3 -N traffic_meter.bak -S 14
ubimkvol /dev/ubi0 -n 4 -N dongle -S 14
ubimkvol /dev/ubi0 -n 5 -N overlay_volume -S 460
ubimkvol /dev/ubi0 -n 6 -N vol_ntgrcryptD -S 25
ubimkvol /dev/ubi0 -n 7 -N vol_ntgrcryptK -S 3

# Restore contents
for i in 14 15 16 17 18 20 21; do
    ubiupdatevol /dev/ubi0_$((i-14)) /tmp/mtd$i.img
done
 

HELLO_wORLD

Very Senior Member
And success, flash recovered.

This is what I did to reformat UBI while preserving the wear-leveling information and also restoring the contents.
Bash:
/etc/init.d/traffic_meter stop   # this uses one of the UBI volumes

# Back up everything, except overlay_volume, to ram disk
for i in 14 15 16 17 18 20 21; do
    cat /dev/mtd$i > /tmp/mtd$i.img
done

# Reformat
ubidetach -d 0
ubiformat /dev/mtd7

# Re-create original volumes and sizes
ubimkvol /dev/ubi0 -n 0 -N cert -S 1
ubimkvol /dev/ubi0 -n 1 -N pot.bak -S 3
ubimkvol /dev/ubi0 -n 2 -N traffic_meter -S 14
ubimkvol /dev/ubi0 -n 3 -N traffic_meter.bak -S 14
ubimkvol /dev/ubi0 -n 4 -N dongle -S 14
ubimkvol /dev/ubi0 -n 5 -N overlay_volume -S 460
ubimkvol /dev/ubi0 -n 6 -N vol_ntgrcryptD -S 25
ubimkvol /dev/ubi0 -n 7 -N vol_ntgrcryptK -S 3

# Restore contents
for i in 14 15 16 17 18 20 21; do
    ubiupdatevol /dev/ubi0_$((i-14)) /tmp/mtd$i.img
done
traffic_meter killed the NAND of my first R7800 which lasted a only a few months.
NEVER enable this abomination!
 

HELLO_wORLD

Very Senior Member
I've not enabled it. Seems like it's running anyway.
You need to make it not executable…
I have a script do disable many things…
The part for traffic_meter is:
Bash:
initd_kill traffic_meter && { neutralize /sbin/cmd_traffic_meter; neutralize /sbin/traffic_meter; }

With the function neutralize and initd_kill being:
Bash:
neutralize() { chmod -x $1; killall -q ${1##*/}; }

initd_kill() { if test -x /etc/init.d/$1
  then /etc/init.d/$1 stop; /etc/init.d/$1 disable; chmod -x /etc/init.d/$1; return 0
  else return 1
fi }

EDIT: I know this is like using a nuclear nuke to stop a fly… But these services have a tendency to try to come back to life, so I prefer to not only stop, disable and remove executable privileges to the init.d services, but also the executable privileges to any specific bin in /sbin or /bin…
This way, they never bothered me again!
 
Last edited:

fossil

Occasional Visitor
NSS.AK.1.0.c8-00015.tar.bz2 is no longer anywhere to download and isn't distributed with the netgear open source release.
NSS.AK.1.0.c8-00015.tar.bz2 contains NG pre-built binaries. Create it yourself by extracting binaries from NG fw.

NSS.00015.png


But at least it's built. make menuconfig is still broken.
There might be some unknown issues, like missing pkgs. You won't see errors for them during build but will eventually run into issues when load the fw on the device.
menuconfig is not relevant. Both NG and Voxel use already created defconfig. Modify it manually if required.
 

xyzzy

Occasional Visitor
Didn't see any docs on how to disassemble the image binary and it seemed a bit of a pain to reverse engineer it. So I extracted the NSS binaries from the router itself.

I'd be surprised if there are outright missing packages, as openwrt should fail to build if any package fails or is missing files from the install list.

I did turn off a couple packages I couldn't find source for. A driver for an obscure USB temperature monitor apparently only sold in Russia and a driver for some old USB wifi adapters. Don't think I'll miss those.

Editing the defconfig by hand is somewhat tedious.

I'm surprised kernel 3.4.103 builds ok with gcc 10.2.0, when netgear was using something like 4.5, so it's quite a big jump. I know previous when I've updated gcc it has exposed kernel bugs. Like this bug for ARM kernels with gcc 10.x that's been around since 3.x.

But changing back to a 4.x toolchain isn't as simple as one setting one config variable. The toolchain will cause other variables to be set. That's why menuconfig being broken is such a pain.
 

fossil

Occasional Visitor
Didn't see any docs on how to disassemble the image binary and it seemed a bit of a pain to reverse engineer it. So I extracted the NSS binaries from the router itself.
Image is just a zip file. 7zip should work. Extracting from router does the same as well.

I'd be surprised if there are outright missing packages, as openwrt should fail to build if any package fails or is missing files from the install list.
You will be surprised.

I did turn off a couple packages I couldn't find source for. A driver for an obscure USB temperature monitor apparently only sold in Russia and a driver for some old USB wifi adapters. Don't think I'll miss those.
Source is available for those. Require some effort to find the commit ver mentioned in package.

Editing the defconfig by hand is somewhat tedious.
It is just few options that one needs to change and only if needed. If you are changing a lot more you have not done something right.

I'm surprised kernel 3.4.103 builds ok with gcc 10.2.0, when netgear was using something like 4.5, so it's quite a big jump. I know previous when I've updated gcc it has exposed kernel bugs. Like this bug for ARM kernels with gcc 10.x that's been around since 3.x.
It builds with gcc 10.2. It even builds with gcc 11.2 Voxel is using gcc 11.2 since fw version .87SF. I built it with gcc 11.2. Check the Voxel fw change log. Grab the appropriate toolchain gcc from openwrt toolchain. For any bug, patches are there.

But changing back to a 4.x toolchain isn't as simple as one setting one config variable. The toolchain will cause other variables to be set. That's why menuconfig being broken is such a pain.
Again it is just few options. Look & learn toolchains and dependencies.
 
Last edited:

xyzzy

Occasional Visitor
Image is just a zip file. 7zip should work. Extracting from router does the same as well.

Hmm? How do you figure that?

Code:
$ file R7800-V1.0.2.90.img
R7800-V1.0.2.90.img: data

$ hexdump -C R7800-V1.0.2.90.img
00000000 64 65 76 69 63 65 3a 52 37 38 30 30 0a 76 65 72 |device:R7800.ver|
00000010 73 69 6f 6e 3a 56 31 2e 30 2e 32 2e 39 30 0a 72 |sion:V1.0.2.90.r|
00000020 65 67 69 6f 6e 3a 0a 68 64 5f 69 64 3a 32 39 37 |egion:.hd_id:297|
00000030 36 34 39 35 38 2b 30 2b 31 32 38 2b 35 31 32 2b |64958+0+128+512+|
00000040 34 78 34 2b 34 78 34 2b 63 61 73 63 61 64 65 0a |4x4+4x4+cascade.|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 27 05 19 56 98 da b3 4f 61 9d fe 8f 00 21 34 88 |'..V...Oa....!4.|
00000090 41 50 80 00 41 50 80 00 c9 c8 25 16 05 02 02 00 |AP..AP....%.....|
000000a0 4c 69 6e 75 78 2d 33 2e 34 2e 31 30 33 00 00 00 |Linux-3.4.103...|
000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 00 00 a0 e1 |................|

It's clearly got some kind of 128 byte text header that's NUL terminated, which is then immediately followed by what I recognize as a uboot image containing a kernel. Maybe eventually the squashfs image is in there, maybe buried in a zip file, but it's not immediately obvious.

You will be surprised.
Can you tell me which ones to be on the lookout for?

Source is available for those. Require some effort to find the commit ver mentioned in package.
I'm kind of surprised they are in there. How many people use obscure Russian USB thermometers and what look like ancient 802.11b (yes b!) USB wifi adapters on the R7800?

It builds with gcc 10.2. It even builds with gcc 11.2 Voxel is using gcc 11.2 since fw version .87SF. I built it with gcc 11.2. Check the Voxel fw change log.

I guess the source isn't up to date.
Code:
$ git grep GCC_VERSION v1.0.2.92SF -- configs/defconfig-r7800
v1.0.2.92SF:configs/defconfig-r7800:CONFIG_GCC_VERSION_10_2_0=y
v1.0.2.92SF:configs/defconfig-r7800:CONFIG_GCC_VERSION="10.2.0"
v1.0.2.92SF:configs/defconfig-r7800:CONFIG_GCC_VERSION_10_2=y

While the kernel does build with gcc 10.2, I did build it after all, it has bugs. Like the one I linked too. There are more. Just search the kernel commit history for "gcc" or "toolchain" and you'll find a bunch of times a newer compiler has created a bug. Often it fails to build, but there are many where it builds but has bugs. The 3.4.103 kernel is sooo much older than gcc 10.2 that there's got to be a bunch that haven't been fixed.
 

fossil

Occasional Visitor
Hmm? How do you figure that?
img.png


I'm kind of surprised they are in there. How many people use obscure Russian USB thermometers and what look like ancient 802.11b (yes b!) USB wifi adapters on the R7800?
R7800 was introduced in 2016. So someone has/had used those usb devices at some point of time.

I guess the source isn't up to date.
Yes that is why I mentioned check the change log. I updated things based on the change log.

I did not ask anyone anything when I built it. This may give you few hints.
https://www.snbforums.com/threads/c...r-r7800-v-1-0-2-90sf.76155/page-4#post-741427

If things are not working may be you should try building NG fw first. NG is using gcc 4.6-linaro.
 

xyzzy

Occasional Visitor

It is not a zip file. The root filesystem is inside a squashfs volume, not a zip file. Of that I am 100% sure. You can see it getting built when you build the firmware and then find it on rootfs mtd partition. Maybe the software you used to look at the image managed to find the squashfs volume hidden inside. Just take a look at the actual file in a hex editor, it's clearly not a zip file, it doesn't start with the right value.

R7800 was introduced in 2016. So someone has/had used those usb devices at some point of time.
Or there's just there because someone didn't think to turn them off! At one point netgear's firmware was loading drivers for audio devices.

Yes that is why I mentioned check the change log. I updated things based on the change log.

I did not ask anyone anything when I built it. This may give you few hints.

That's great. I got it built too and I've got all my fixes checked into git so I can push them to a clone of the source, so other people don't have to fix them too.

https://www.snbforums.com/threads/c...r-r7800-v-1-0-2-90sf.76155/page-4#post-741427

If things are not working may be you should try building NG fw first. NG is using gcc 4.6-linaro.
I can probably just fix menuconfig so it'll be easy to change toolchains.
 

fossil

Occasional Visitor
It is not a zip file. The root filesystem is inside a squashfs volume, not a zip file. Of that I am 100% sure. You can see it getting built when you build the firmware and then find it on rootfs mtd partition. Maybe the software you used to look at the image managed to find the squashfs volume hidden inside. Just take a look at the actual file in a hex editor, it's clearly not a zip file, it doesn't start with the right value
May be you missed the part where I mentioned use 7zip.

That's great. I got it built too and I've got all my fixes checked into git so I can push them to a clone of the source, so other people don't have to fix them too.
Cool.
 

xyzzy

Occasional Visitor
May be you missed the part where I mentioned use 7zip.
No I did not. I said the "software you used to look at the image", that would be 7zip, might have managed to find the squashfs partition. But that does not change the fact that the firmware image is not "just a zip file". It is a 128 byte header that's text, then a uboot image containing a kernel, and then maybe other stuff, and then somewhere there is a squashfs image. Not a zip file, not just anything.
 

fossil

Occasional Visitor
squashfs is a compressed read-only file system. I gave you the hint how to extract it. You do not need any "reverse engineering" to do that.

There is no point in getting grumpy.
 
Last edited:

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top