What's new

traffic manager crazy numbers again 374.41 (Merlin build)

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Here is what is happening:

1. "rstats.c:calc" is expected to ignore all traffic from ppp0, because as far at it is concerned ppp0 is not a WAN interface.

2. "shared/misc.c:netdev_calc" used to return 0 for "ifname==ppp0" because the statement below (from 35_2) would never evaluate to true ("wan_ifnames=eth0" for PPPoE) and "netdev_calc" would execute "return 0" at the end of the method.

// find in WAN interface
else if(nvram_match("wan0_primary", "1") && nvram_contains_word("wan_ifnames", ifname))

3. The new behaviour of netdev_calc is different: it treats ppp0 as a WAN interface: get_wan_unit in the statement below would return WAN_UNIT_FIRST for _both_ eth0 and ppp0, which is a significant change for rstats. Once this evaluates to true, netdev_calc returns 1 at line 1158 (in .42). That confuses rstats.c to no limit.

// find in WAN interface
else if (ifname && (unit = get_wan_unit(ifname)) >= 0)

4. Below is a fix that I made locally in rstats.c:calc to check for the ppp0 interface before netdev_calc is invoked. I considered making the fix outside of rstats.c, but the logic is quite complicated and confusing.

if(!strncmp(ifname, "ppp", 3))
continue;

if(!netdev_calc(ifname, ifname_desc, &counter[0], &counter[1], ifname_desc2, &rx2, &tx2))
continue;


RMerlin, your thoughts?

Using eth0 as WAN wouldn't be accurate, since it would include all the PPP overhead. ppp0 should be the correct interface to use.
 
RMerlin, my router currently says that I downloaded 23.5G so far today (since midnight); my download program reports 22G (the downloader started at 2am). So the numbers are pretty close. I also remember that the router's numbers were always a bit higher than those reported by my ISP. I will have to wait until tomorrow morning to compare what the router reports to the numbers from the ISP.
I just calculated the PPPoE overhead and it is a little over 5% (5.2929221%); now I know why the router reported usage was always a bit higher.
I do not think that rstats was using ppp0 before and 'nvram_contains_word("wan_ifnames", ifname))' in the old code would make sure that ppp0 is not used, unless I am missing something obvious.
We could first get back to where we were before (prior to .37 I think) and then think about how to improve it and make more precise.

P.S. I can share this build with anyone who wants to test if there is interest. Alternatively, anyone could rebuild it by following the instructions.
 
Last edited by a moderator:
My ISP reported 29G of downloads and 0.93G of uploads for 24h period yesterday and the router shows 30.36G and 1.1G. The 24h page now seems to be working as it used to: the router reports slightly higher usage.
Anyone who wants to try it, the change is in "release/src/router/rstats/rstats.c": look for the line with "netdev_calc" and add an "if" statement before it just like below (this will ignore ppp0 interface in speed calculations).

if(!strncmp(ifname, "ppp", 3))
continue;

if(!netdev_calc(ifname, ifname_desc, &counter[0], &counter[1], ifname_desc2, &rx2, &tx2))
continue;

Keep in mind, that the data in the existing traffic files is already corrupted and the historical usage will not be fixed, but the new data will be fine.
 
My ISP reported 29G of downloads and 0.93G of uploads for 24h period yesterday and the router shows 30.36G and 1.1G. The 24h page now seems to be working as it used to: the router reports slightly higher usage.
Anyone who wants to try it, the change is in "release/src/router/rstats/rstats.c": look for the line with "netdev_calc" and add an "if" statement before it just like below (this will ignore ppp0 interface in speed calculations).

if(!strncmp(ifname, "ppp", 3))
continue;

if(!netdev_calc(ifname, ifname_desc, &counter[0], &counter[1], ifname_desc2, &rx2, &tx2))
continue;

Keep in mind, that the data in the existing traffic files is already corrupted and the historical usage will not be fixed, but the new data will be fine.

Very cool, thanks for figuring this out !
Perhaps it will make it into the next release ;-)
 
Very cool, thanks for figuring this out !
Perhaps it will make it into the next release ;-)

Not as it is, since this is just a workaround and not a real fix to the core issue.
 
Not as it is, since this is just a workaround and not a real fix to the core issue.
RMerlin, it is up to you to decide of course, but I see two distinct issues here: 1) tm is not working at all and 2) tm is using the wrong interface. #2 was there forever and #1 was caused by an incompatible change by Asus that changed the way ppp0 is treated and that affected tm. I am sure a lot of people would be happy to first get were we were before the 37 release and then wait for an improved tm that does not count the PPPoE overhead. At the same time everyone could always compile a patched version if they so wish...
 
RMerlin, it is up to you to decide of course, but I see two distinct issues here: 1) tm is not working at all and 2) tm is using the wrong interface. #2 was there forever and #1 was caused by an incompatible change by Asus that changed the way ppp0 is treated and that affected tm. I am sure a lot of people would be happy to first get were we were before the 37 release and then wait for an improved tm that does not count the PPPoE overhead. At the same time everyone could always compile a patched version if they so wish...

perhaps it would also encourage asus to fix their original firmware seeing that in a custom version (bandaid or no) it works correctly....if they care
 
Can you try the fix proposed here?

https://github.com/RMerl/asuswrt-merlin/pull/628

I suspect Pinwing is closer to the root cause of the problem, which would be a cleaner fix and less likely to break in the future if for some reason the WAN interface naming were to change (for instance, you can already potentially have ppp0 and ppp1 in a dual wan situation).
 
Can you try the fix proposed here?

https://github.com/RMerl/asuswrt-merlin/pull/628

I suspect Pinwing is closer to the root cause of the problem, which would be a cleaner fix and less likely to break in the future if for some reason the WAN interface naming were to change (for instance, you can already potentially have ppp0 and ppp1 in a dual wan situation).
I will try shortly. The reason I did not want to modify netdev_calc in the first place was because it is called from at least one more place in the firmware and a change like that could have an affect on something else ...
 
I will try shortly. The reason I did not want to modify netdev_calc in the first place was because it is called from at least one more place in the firmware and a change like that could have an affect on something else ...

I'll have to recheck, I thought I had looked back at the time and didn't see any other reference to it.

I'd still be interested in seeing if it does resolve the issue. If it does, I'll take a look to ensure it doesn't affect any other caller to that function.
 
I'll have to recheck, I thought I had looked back at the time and didn't see any other reference to it.

It's used in the httpd daemon for the webui Realtime traffic monitor. Just see if traffic monitoring still displays properly on the realtime page after the patch, but it shouldn't be a problem unless there's something else hardcoded on the realtime page that would make it behave differently.
 
The 24h page looks ok to me so far, but I want to wait until tomorrow morning to compare the usage against what my ISP reports. I cannot test with DHCP or static IP though.

The real time page _seems_ to be showing somewhat higher speeds than before: 1,956Kb/s and my line can only do around 1,500Kb/s. I am not sure if this is because of the change or not. These last long enough to also show up on the 24h page.
 
The 24h page looks ok to me so far, but I want to wait until tomorrow morning to compare the usage against what my ISP reports. I cannot test with DHCP or static IP though.

The real time page _seems_ to be showing somewhat higher speeds than before: 1,956Kb/s and my line can only do around 1,500Kb/s. I am not sure if this is because of the change or not. These last long enough to also show up on the 24h page.

Alright, let me know how things are looking after 24 hours.
 
Alright, let me know how things are looking after 24 hours.
After 24h the router reported 28.31G & the ISP is reporting 27G. The difference is slightly under 5%, which is close to the PPPoE overhead, which still seems to included.

I went back to my fix to verify the real time page and it still shows frequent spikes to 1.9Mb/s, which my line cannot deliver. Could that the timing of the cstats/rstats probes is not perfect...

This new fix seems to be working, but it looks like it is including the PPPoE overhead.
 
After 24h the router reported 28.31G & the ISP is reporting 27G. The difference is slightly under 5%, which is close to the PPPoE overhead, which still seems to included.

I went back to my fix to verify the real time page and it still shows frequent spikes to 1.9Mb/s, which my line cannot deliver. Could that the timing of the cstats/rstats probes is not perfect...

This new fix seems to be working, but it looks like it is including the PPPoE overhead.

1.9 MB/s on a 15 MBits line is normal. I have a network monitor running on my desktop, and downloading on my 30 Mbits cable (not PPPoE) connection gives me 3.9 MB/s on the desktop itself. This is because you need to divide by 8 to convert between bits and bytes, not by 10.

Thanks for confirming the fix is working - I'll integrate it in the next release. This method should also be safe for USB-based modems, since it doesn't rely on the physical interface (which would be usb and not eth0).
 
1.9 MB/s on a 15 MBits line is normal. I have a network monitor running on my desktop, and downloading on my 30 Mbits cable (not PPPoE) connection gives me 3.9 MB/s on the desktop itself. This is because you need to divide by 8 to convert between bits and bytes, not by 10.

Thanks for confirming the fix is working - I'll integrate it in the next release. This method should also be safe for USB-based modems, since it doesn't rely on the physical interface (which would be usb and not eth0).
RMerlin, I might be doing something wrong but this commit below breaks the page again. I originally tested the version with "!=" and you committed "==".

https://github.com/RMerl/asuswrt-merlin/commit/4443889f77dbc8ab70175522397c5788d392a19c

BTW, I belive the fix still using eth0 ...
 
Rmerlin, I read the author's comment on github and I am not using IPTV and I do not have vlan6. My fix and his first version got the things were they were before: 24h page working but counting the ovehead. Can you just keep his first version only? It should work ok for all, because it is using the phisical interface. Excluding the PPPoE overhead can be a future improvement and might take a while. I have looked through the code and it might not be a simple undertaking.
 
Rmerlin, I read the author's comment on github and I am not using IPTV and I do not have vlan6. My fix and his first version got the things were they were before: 24h page working but counting the ovehead. Can you just keep his first version only? It should work ok for all, because it is using the phisical interface. Excluding the PPPoE overhead can be a future improvement and might take a while. I have looked through the code and it might not be a simple undertaking.

Fixing for one person while breaking for another one isn't a solution - it's a kludge. I'd rather have the issue resolved for real, rather than getting another flood of reports 3 weeks later saying that "It used to work for me and now it's broken".

I'll re-open the issue on Github. You two should compare your settings to figure out what's different between both.

EDIT: Issue re-opened.
 
Last edited:
Rmerlin, I read the author's comment on github and I am not using IPTV and I do not have vlan6. My fix and his first version got the things were they were before: 24h page working but counting the ovehead. Can you just keep his first version only? It should work ok for all, because it is using the phisical interface.

In the case of a USB device, there is no physical device, so that would not work for issues like reported here. This is why relying on eth0 is not a valid solution.

The same would probably apply to Dual WAN users.
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top