What's new

Updated to 384.5 and it borked my server, any ideas?

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Karl_

New Around Here
Updated the firmware on my AC3200 yesterday to 384.5_0 (from 380.68_4), it installed okay and automatically did a factory reset after completing. Luckily, I'd been planning to do a factory reset anyway so I had all my settings documented to configure everything again.

After setting everything up, I was unable to connect to any of my LXC containers running on a Proxmox server. I could connect to the Proxmox management page (192.168.1.240), but none of the containers were responding at all (192.168.1.241->250 via linux bridge). From within the host shell I was able to ping each of the containers on their IP address, but it was very slow to respond (typically zero response until packet number 50-80 and then it would start replying). Tried switching on/off and restarting everything (several times) but no luck. One of the containers has port forwarding / DDNS setup and was accessible via the Internet, but just not responding internally.

After spending many hours trying to figure out what was going wrong, I gave up, used factory recovery to re-install 380.64_4 and restored my settings back-up and everything is back to normal and working again. I'd really like to be able to do the update as my NVRAM is constantly maxed out so the increase there is something I'd like to get sorted.

Anyone have any ideas what might have changed / be causing this or how to figure it out?
Thanks!
 
Updated the firmware on my AC3200 yesterday to 384.5_0 (from 380.68_4), it installed okay and automatically did a factory reset after completing. Luckily, I'd been planning to do a factory reset anyway so I had all my settings documented to configure everything again.

After setting everything up, I was unable to connect to any of my LXC containers running on a Proxmox server. I could connect to the Proxmox management page (192.168.1.240), but none of the containers were responding at all (192.168.1.241->250 via linux bridge). From within the host shell I was able to ping each of the containers on their IP address, but it was very slow to respond (typically zero response until packet number 50-80 and then it would start replying). Tried switching on/off and restarting everything (several times) but no luck. One of the containers has port forwarding / DDNS setup and was accessible via the Internet, but just not responding internally.

After spending many hours trying to figure out what was going wrong, I gave up, used factory recovery to re-install 380.64_4 and restored my settings back-up and everything is back to normal and working again. I'd really like to be able to do the update as my NVRAM is constantly maxed out so the increase there is something I'd like to get sorted.

Anyone have any ideas what might have changed / be causing this or how to figure it out?
Thanks!
Do you see any messages in the System Log that may help with the debugging?
 
Karl, upgraded our 3200 first thing Sunday morning, was at 380.68_4. Did nothing different than usual, backed everything up Saturday and started from full power off, without the modem on. Uploaded v384_5 and waited until the FW notified it needed a manual restart, so turned the power off on the AC rocker switch and let it sit a couple of minutes. Powered up the modem and after it synced, powered up the 3200. It took a wee bit longer than it used to with the pre NG builds, but that was to be expected since it's a total rewrite and upgrade to a new era, so to speak.

Despite all of the negative experiences that have been posted, this upgrade has been painless for us, and positive. We didn't experience any loss of settings including SSID and passwords; all of the wireless networks work at top speed. Both OpenVPN clients connected automatically after re-powering so it was a pleasant surprise. I was fully prepared to follow all recommended steps and set everything up manually/from scratch (and still may), but there's nothing to indicate that it needs to be done; knock on silicone. Perhaps upgrading offline or restarting from a cold restart with a fresh connection isn't SOP or what most do, but it always does the trick for this router. Perhaps it might work for you?

Another great FW build by Eric (and as usual, many thanks for your work Eric:) ) Cheers.
 
Last edited:
it installed okay and automatically did a factory reset after completing.
I don't think that's normal behaviour. It might indicate that the upgrade didn't complete properly. If I were you I'd remove any USB devices, do a factory reset, then do the upgrade to 384.5_0 again, followed by another factory reset and manual configure. Possibly a bit over the top but "better safe than sorry".
 
I don't think that's normal behaviour. It might indicate that the upgrade didn't complete properly. If I were you I'd remove any USB devices, do a factory reset, then do the upgrade to 384.5_0 again, followed by another factory reset and manual configure. Possibly a bit over the top but "better safe than sorry".
Yeah, something definitely didn't work right. Upgrading from 380 to 384 is a one-way process on the AC3200. If 380 is working normally after reverting from 384, it didn't upgrade correctly in the first place. It might appear to work for a while, but it won't be stable.
 
I'd remove any USB devices, do a factory reset, then do the upgrade to 384.5_0 again, followed by another factory reset and manual configure. Possibly a bit over the top but "better safe than sorry".
Thanks, tempted to give it another try and do that, but this gives me reservations:
Upgrading from 380 to 384 is a one-way process on the AC3200.
So if the upgrade installs 'correctly' but I still can't access the server then I can't rollback to an older firmware?
 
Thanks, tempted to give it another try and do that, but this gives me reservations:

So if the upgrade installs 'correctly' but I still can't access the server then I can't rollback to an older firmware?
Without having to figure out how to change the NVRAM size back, no. One assumes that's not irreversible, but I couldn't begin to guess at the complexity of such a task.
 
There are people who claim to have successfully reverted, not that anyone can prove it. We put off upgrading far too long and have no desire to revert as good as the old build was for many months. The upgrade was unremarkable in the fact that is was entirely successful. Just a guess, but if the upgrade is successful for you, and it is for the majority, there should be no problems with accessing your server. If you disconnect everything, upgrade, set everything back up from scratch, then reconnect, it should work without incident.

You do have a backup router by the way? We have three, though it would be discouraging, not to mention humbling to admit an inability to manually reset and reconnect everything in reverse order; just joking. The upgrade should work; take snapshots and document your process and all should go well. If not, then you have the perfect reason to buy that shiny RTAC86 you know you want:) Good luck.
 
Without having to figure out how to change the NVRAM size back, no. One assumes that's not irreversible, but I couldn't begin to guess at the complexity of such a task.
Interesting... NVRAM definitely displayed as being increased after the update - 133K-ish?
After rolling everything back it's now showing as 65,554 / 65,536 bytes - So I'm over the limit now (it wasn't before all this, it was close, but never over).

If you disconnect everything, upgrade, set everything back up from scratch, then reconnect, it should work without incident.
Well I shall give it another try later and let you know. Everything else was working fine by the way - all regular PCs, tablets, etc on the network were fine, VPN config, blah blah - all fine except the VMs running on the server which all vanished.
 
none of the containers were responding at all (192.168.1.241->250 via linux bridge). From within the host shell I was able to ping each of the containers on their IP address, but it was very slow to respond

Sounds like these are static IPs, did you set up static DHCP leases for them on the router? Is it possible that other devices have grabbed those IP? Because what you’re describing sounds like IP conflict.
 
Sounds like these are static IPs, did you set up static DHCP leases for them on the router? Is it possible that other devices have grabbed those IP? Because what you’re describing sounds like IP conflict.
Yup, setup all of them on the LAN->DCHP page after the factory reset. Could only access management page of one (the host server), others all timed out completely from any other device on the network.
 
Yup, setup all of them on the LAN->DCHP page after the factory reset. Could only access management page of one (the host server), others all timed out completely from any other device on the network.

Reason I asked is that it is possible some wireless clients have nabbed those IPs before you set up those leases. Make sure you modify the dynamic DHCP range to exclude the container range, and reboot the router just to let any prior DHCP leases expire.

If that doesn’t work, two things I would try:

1. If you’re comfortable with Wireshark, start it before you try to ping the containers and look for the ARP request packets and see if the reply MAC match the container’s.

2. Monitor the system log and see what’s posted right around when the container “starts” to respond to ping (as you mentioned they started to after 80th or so packets).
 
Karl, it will probably update without any issues. Just a suggestion, you might consider turning the router off, and disconnecting or powering off everything connected, including your modem and except for your computer you need to actually upload and administer the upgrade. As long as you have the backup and snapshotswaiting in the wings in case you need to reload from scratch, the FW usually updates much quicker from a cold/hard start when the router hasn't had the time to accept all of the information it usually sees. Put differently, after a fresh start the NVRAM is as clear as is can be and nothing should be loaded or there to distract the router during the upgrade. Not meaning to repeat the prior post but we've done it this way for several years and have never had a single instance or issue with a Merlin update going wonky. Don't pretend to be a wizard, only hope your next try works as well for you. The key is patience; it always takes a few minutes longer than anyone thinks it should.

While the router is starting from the cold start, we restart the browser and use only one tab so there's nothing in the cache to cause any issue during the update, The cache in our browser self destructs every few seconds and runs in virtual mode anyway, but just to be safe... When the router is fully up and ready to accept the new FW, load it and pull the trigger and be patient. When it's finished upgrading and asks for a power cycle, you can safely turn the power off going to the router, or just unplug the cord if you want, then wait a few minutes. Have a coffee or whatever and try not to fret, leaving only the computer running in the same browser tab that you just used. After a few minutes, plug the router back in, and let it go through it's new and improved startup and checklist. When it pops up in your browser tab, you should see the beautiful GUI and hopefully, the same logon and password that was in it from the previous version, if all went well. If that's the case and all your settings are intact, cycle all of your other gear. starting with the modem and you should see it all waiting for you as it was before. If you have a ton of custom scripts, your mileage may vary. Ours hadn't missed a tick since the upgrade. Crossed fingers for you; good luck!:)
 
It might appear to work for a while, but it won't be stable.
Spot on. It 'forgot' WiFi SSIDs yesterday evening, and then this morning had apparently done a random factory reset, which left me with no choice by to try the firmware update again.

Following st3v3n's suggestions of reboots and disconnecting everything, then re-configured all my static IPs, etc - the process went pretty smoothly (same as last time) and after the update completed the VMs were available .. for 5 minutes - and then all vanished again.

The host server is always available, but it seems like the VMs are available for a few minutes and then all become unreachable. Sending a ping from my desktop machine brings them back temporarily:
Pinging 192.168.1.241 with 32 bytes of data:
Request timed out.
Reply from 192.168.1.241: bytes=32 time<1ms TTL=64
Reply from 192.168.1.241: bytes=32 time<1ms TTL=64
Reply from 192.168.1.241: bytes=32 time<1ms TTL=64

Ping statistics for 192.168.1.241:
Packets: Sent = 4, Received = 3, Lost = 1 (25% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
Doing this allows me to re-connect to the management page of each container for 5-10 mins and then it stops responding again, until I re-ping it. (I've also checked from other devices to see if they are able to reach the VMs during this time - they can't)

Checked the System Log and it looks like this:
May 24 13:39:17 kernel: net_ratelimit: 2030 callbacks suppressed
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:39:17 kernel: TCP: time wait bucket table overflow
May 24 13:50:06 syslogd started: BusyBox v1.25.1
May 24 13:50:06 kernel: klogd started: BusyBox v1.25.1 (2018-05-12 21:53:58 EDT)
May 24 13:51:56 syslogd started: BusyBox v1.25.1
May 24 13:51:56 kernel: klogd started: BusyBox v1.25.1 (2018-05-12 21:53:58 EDT)
Looked into that "time wait bucket table overflow" error on this forum, typically linked to torrents, definitely none running on network, so checked netstat page to see where it's coming from:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 192.168.1.1:60984 192.168.1.53:54184 TIME_WAIT
tcp 0 0 192.168.1.1:60984 192.168.1.53:52064 TIME_WAIT
tcp 0 0 192.168.1.1:60984 192.168.1.53:53632 TIME_WAIT
tcp 0 0 192.168.1.1:60984 192.168.1.53:51949 TIME_WAIT
tcp 0 0 192.168.1.1:60984 192.168.1.53:53644 TIME_WAIT
[Plus another ~2,500 connections from the router all to my desktop PC.]

Is this likely to be causing the issues?
Not really sure what's next in terms of figuring this out and how to fix it - any ideas welcomed!
Thanks again.
 
Karl; that's wild, to have the upgrade go well then have the VM's bounce on and off regularly like that is beyond my comprehension/experience at least for the moment. Will make more coffee and include some time for research this evening. In the meanwhile, am reasonably sure that Xentrk, Collin or one of the other most excellent resident Wizards will offer their meaningful input and insight regarding your dilemma.

This is only a hunch, but there's no reason the new FW should causing the VM's to drop;if it worked before, it should work now. That said something is causing this disconnection syndrome. Did you reload your OpenVPN configs, if you're using the the same configs/tunnels?. The only time we've had issues with connections dropping was last year. Finally ran it down, changed the ntp time lookup. the setting was acting up, and causing a similar problem dropping connections, Don't know why it came about, but changing to a different independent time server instead of using the regular pool eliminated the problem and it never returned. I had tried everything before doing that. That particular timing issue was specific to our 3200 at the time, but a couple of other folks reported similar episodes, that eventually resolved. This was on the older 380.68_4 firmware, not the NG.

Since we aren't running VM, this is speculation only. If you've set everything completely from scratch, something is punking the sync (not a concise statement, but seems appropriate). In any event, keep at it, keep a keen eye on routing, logs and timing, good luck, and don't give up, something will present the 'aha' moment. Cheers.
 
Is this likely to be causing the issues?
Not really sure what's next in terms of figuring this out and how to fix it - any ideas welcomed!
Thanks again.

It might not be the cause but is definitely related. Could you investigate which application on your desktop is sending that excessive traffic to the router?

Also, how are your desktop and server connected to the router? Both wired?
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top