1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice

Welcome To SNBForums

SNBForums is a community for anyone who wants to learn about or discuss the latest in wireless routers, network storage and the ins and outs of building and maintaining a small network.

If you'd like to post a question, simply register and have at it!

While you're at it, please check out SmallNetBuilder for product reviews and our famous Router Charts, Ranker and plenty more!

Broadcom's hardware acceleration

Discussion in 'General Wireless Discussion' started by RMerlin, Jul 5, 2014.

  1. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    30,620
    Location:
    Canada
    Since this is a frequently discussed topic, I thought I'd put what I know about this in a new thread.

    CTF is Broadcom's closed-source, proprietary "secret sauce" that allows routers based on their hardware to achieve near gigabit performance. It does so through various methods which are not publicly known (even manufacturers don't get access to the ctf.ko source code AFAIK). One of it involves bypassing parts of Linux's Netfilter (the FORWARD chain is the most known one).

    So as you can already see, "hardware acceleration" isn't an entirely accurate name. At least one portion of that acceleration is really a software trick (bypassing part of Linux's stack).

    Due to these bypass, it prevents various firmware-level features from working. Anything that relies on the FORWARD chain for instance. The solution used by router manufacturers usually work on two different levels:

    1) Some manufacturers like Asus and Netgear (if I remember correctly) will allow port-forwarding to work by modifying the Linux kernel so that any packet that gets marked will be flagged to bypass the CTF code. At the iptables level, any port-forwarded packet gets marked with a value. This way, you can have HW acceleration enabled and still use port forwards. The obvious consequence of this is that any traffic going through a port forward will not be "hardware-accelerated". So if you were to push a lot of traffic over a forwarded port, that traffic would probably not be able to reach near gigabit performance.

    (caveat: I never actually tested this. I assume that CTF bypass is applied to every single packets that gets marked, not just on part of it)

    2) When certain incompatible features are enabled, then the router is rebooted with CTF disabled. In this mode, the processing is then entirely done by Linux. It allows you to do anything you'd want (as a firmware developer), but performance is seriously impacted. A typical 600 MHz MIPS device (such as the RT-N66U) will reach a WAN to LAN limit of around 150-200 Mbps (less if you start heavily processing traffic through QoS, parental control, custom firewall rules, etc...). Unfortunately, it's not always clear to the end user when HW acceleration is automatically disabled by such a thing. If your router has telnet access, you could see if the ctf.ko kernel module is loaded or not, using the "lsmod" command.

    CTF is what explains why most third party firmwares (such as DD-WRT) tend to have lower throughput than manufacturer stock firmwares. For people with average (North American levels there) WAN rates of 10-100 Mbps, this is not an issue. Any additional feature will come at no real cost on maximum throughput. But for our more fortunate oversea friends who get 100-1000 Mbits link speeds, CTF is virtually essential.

    Due to the nature of its closed-sourceness, and also the fact that many advanced features do not work with CTF enabled, this is why most third party firmwares such as DD-WRT or OpenWRT don't support CTF.


    Now, another recent topic: the different levels of hardware acceleration. Recent Broadcom chips support a new technology they call "Flow Acceleration", or "FA" for short. Broadcom's demonstration can be seen in this video:

    https://www.youtube.com/watch?v=vwRmQkkZ71E

    In home routers that have hardware supporting this, it gets handled by the same ctf.ko module, in addition to support being implemented at the Ethernet driver level. Unfortunately I don't know which specific Broadcom chips support this, or which specific routers support it. I know that neither the RT-AC56U or RT-AC68U (as of this date) support this at the hardware level. No idea about Netgear or Linksys's recent products.

    In Asus's particular case (since it's the one I'm most familiar with - someone else could fill us up on the other manufacturers), they are handling this as a "Hardware acceleration level". Level 1 is just traditional CTF. Level 2 is traditional CTF + FA. One coming product that does support both levels will have to downgrade from Level 2 to Level 1 when one of the new features they are adding will be enabled.

    One thing I do not know however is what kind of performance impact FA has on a router. Traditional CTF was already able to push things fairly close to gigabit speed with a minimal CPU impact.


    (disclaimer: most of this is based on my own experience over the years. Due to the blackbox nature of CTF, I might not be 100% correct on all of this, so if anyone has any additional detail or corrections, feel free to share)
     
    Zirescu, MysteriousGuy and tim like this.
  2. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,071
    Location:
    San Diego, CA
    Interesting...

    Most BRCM router SOC's also include a switch functional block as well as the CPU core - this goes for both MIPS and ARM architectures...

    CFT (Cut Thru Forwarding) - what it looks like to me is that once IP NAT is enabled, they can bind the IP's to specific MAC addresses, and basically perform NAT at the MAC layer within the switch functional block, bypassing the upper layer.

    With Flow Acceleration - just a guess here - is that they've been also able to add a tag to the MAC layer switching functions to bind flow attributes.

    Where this would break things is when someone has to get out of the linux IP stack and escape out to userland - for things like traffic-shaping (app layer, not 802.11e QoS) and apps like OpenVPN and others.

    Interesting - other chipset vendors do not need what BRCM has done with ctf.ko and they do seem to have decent performance...

    sfx
     
    MysteriousGuy and tim like this.
  3. sinshiva

    sinshiva Very Senior Member

    Joined:
    Nov 8, 2013
    Messages:
    1,067
    Location:
    FL
    from working with a user on the forum in the relatively recent past, an extensive QoS port based ruleset dropped their NAT throughput to around 85mbit with the single core N66U, using the current tc QoS implementation. just a heads up.
     
  4. RussellInCincinnati

    RussellInCincinnati Senior Member

    Joined:
    Feb 20, 2013
    Messages:
    301
    Ridiculously well written explanation.
     
  5. IAAI

    IAAI Senior Member

    Joined:
    Dec 9, 2013
    Messages:
    462
    Would CTF-FA affect the cpu load ?
    if it is going to be enabled by default and the user only has less than 100 Mb internet connection , What could be the benefit of using either one of the CTF Versions at this situation ?


    Sent from my iPhone using Tapatalk
     
  6. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    30,620
    Location:
    Canada
    Any CPU cycles not used for routing makes it available to do other things. For example, file sharing (USB access from Samba can get quite CPU intensive). Or, OpenVPN crypto. Otherwise, if you have, for example, an RT-N66U with a 75 Mbits connection and are downloading at full speed, it means there's only 50% of CPU time left for copying a file over SMB - the speed will drop.

    Granted, it's not as important, but it does carry a few benefits, that one has to evaluate in comparing to the drawback it entails (lack of advanced traffic monitoring/shaping/QoS, etc...).
     
  7. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,071
    Location:
    San Diego, CA
    I think there is fixed functionality within the Switching Functional Block within the BRCM System on Chip - in some ways it's like GPU's, where there are programmable blocks (for example shaders) vs. fixed function blocks where things can be done in the HW itself.

    With CTF.KO, Broadcom can take common functions and set/configure the hardware in the switch directly - and get a fair amount of performance - this even goes into 802.11 standard QOS attributes, along with their ethernet wired counterparts.

    When we get into Application Shaping, this is beyond the scope of the ethernet switch, which operates at Layer 3, as AppShaping/Priority Management/etc... tend to be at the IP and above layers in the stack.

    When going down that path, one has to take the fixed functions out, do some more work at examining packets and applications, and this is why CTF.KO no longer applies, and it falls back to the regular Linux network stack (which is fully capable of doing this, but it is slower and more work).

    It's similar in some ways to LT2P vs. OpenVPN - L2TP/IPSEC is inside the kernel and IP stack natively, vs. OpenVPN being at the App Layer and UserLand, so that takes up more CPU cycles, as this can't be done at any lower layer...

    (in Standards Engineering Speak - App Priority and OpenVPN are layer violations, as opposed to standards based QoS and LT2P/IPSec).

    Happy Thanksgiving Everyone!

    sfx
     
  8. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,071
    Location:
    San Diego, CA
    And by Layer Violations - it doesn't mean it won't work - most often it does, and to that point, is because standards didn't account for it.

    But it will, and often does, affect performance at the HW level - and we see this every day.

    It's a good thing - pushes the standards forward as well - there's a couple of folks on this board that totally get how this works...

    sfx
     
  9. remixedcat

    remixedcat Senior Member

    Joined:
    May 10, 2012
    Messages:
    418
    Enterprise routers/security appliances handles QoS fine. Most of those use Atheros
     
  10. netware5

    netware5 Senior Member

    Joined:
    Mar 9, 2013
    Messages:
    339
    Location:
    Bulgaria
    Does RT-N66U support FA?
     
  11. L&LD

    L&LD Part of the Furniture

    Joined:
    Dec 9, 2013
    Messages:
    9,592
    Considering that the much newer RT-AC68U and RT-AC56U don't support this in hardware, I would say no.
     
    netware5 likes this.
  12. System Error Message

    System Error Message Part of the Furniture

    Joined:
    Oct 14, 2014
    Messages:
    4,083
    If hardware acceleration uses the switch level functionality it would explain the limitations of hardware NAT (limited number of connections (memory) and functionality (switch chips are very simple massively parallel CPUs that are very difficult to code for). based on RMerlin's explanation hardware acceleration would work only if the traffic didnt have to go through the CPU. Traffic hits the port, is mapped using the layer 2/layer 3 switch chip and gets forwarded to the relevant PC without going through the OS which means no features from the firmware.

    It is very limiting and comparing broadcom's solution to a single core of the Tilera TILE CPU used in mikrotik CCRs, It can only closely match the throughput a single one of these cores can put out despite being a software router. A single TILE core does 2Gb/s of NAT if the ports are directly connected to the CPU and not through a switch chip. Unfortunately the minimum configuation you can get is 9 cores for TILE. You can actually purchase PCIe cards with ram slots and TILE cores from Tilera or even rackmountable routers. You would have to compile a linux OS for it using their tools though just to use it.

    Seems like broadcom is falling behind.
     
    Last edited: Apr 18, 2015
  13. Soul--Reaver

    Soul--Reaver Regular Contributor

    Joined:
    Dec 20, 2012
    Messages:
    52
    You say broadcom is falling behind but the pricetags are completely different. You can't compare a ferrari with a family car. Broadcom just found a shortcut which they are trying to exploit for maximum benefit

    I just bought a RT-AC87U and loaded it up with merlin 378.55 firmware and it has "Enabled (CTF + FA)" for HW acceleration with the proper features disabled or enabled (needed to disable STP as it was enabled by default)
    Traffic monitor seems to be a little wonkey though. Realtime works fine but last 24 hours, daily and monthly do not. Maybe thats caused by something else...

    Also my previous router RT-N66U had a Tomato Shibby firmware loaded unto it for a short while for testing. Everything worked just fine. My internet connection is 200/20 mbits.
    Tomato could pull that 200mbits downstream but at those moments you couldn't do anything else as simple browsing often would be unresponsive until whatever was responsible for that downstream bandwidth would stop. So it's not just slow but possibly unresponsive in situations without CTF

    I really wish routers would split the work among different processors/cores. One dedicated for routing, one dedicated for Wifi and one for other features (increase any of them as necessary based on router specs). That way the routing and wifi part would work as expected even under heavy loads
     
  14. L&LD

    L&LD Part of the Furniture

    Joined:
    Dec 9, 2013
    Messages:
    9,592
    This is why I want an i'X' based router (Skylake?) as soon as possible.

    In a constantly connected world such as we're in today, consumers need equal power too. Not just the backend and ISP providers.
     
  15. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,071
    Location:
    San Diego, CA
    There are pre-built x86 based routers out there - but they're still beyond the 200 price point that folks seem to want to pay...

    Example here - http://store.pfsense.org/SG-2440/

    Probably easier to work with and write code for... being x86 based compared to Microtik's Tilera platform (Tilera may have more horsepower, but writing code that works well in a multi-core tilebased environment is something probably more useful in the HPC realm rather than router/gateways)

    That being said, I agree - we're at a point where a broadband connection can be held back by the current generation of ARMv7 (Cortex-A9) chipsets can handle... Broadcom, Atheros, or Marvell...
     
  16. tim

    tim Regular Contributor

    Joined:
    Jan 8, 2015
    Messages:
    130
    Location:
    UK
    My RT-AC3200 had CTF + FA enabled when using factory defaults.
    When I enabled AiProtection (the Trendmicro stuff) it changed to CTF only.
     
  17. L&LD

    L&LD Part of the Furniture

    Joined:
    Dec 9, 2013
    Messages:
    9,592
    sfx2000, I don't want it to run pfsense, I want it to run Asus' (and RMerlin's and the forks) instead. And Atom is not a performance part either, tbh.

    More overhead with more features is a sum zero game.

    I want more actual performance on the software / firmware we're running now.
     
  18. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,071
    Location:
    San Diego, CA
    Fair enough... but I do agree that we're at a point where we're CPU limited with current Router/AP SOC's in the consumer market.

    (FWIW - I've not been impressed by Asus direct efforts - RMerlin's contributions there have been invaluable to making things work on that particular platform)
     
    L&LD likes this.
  19. prelude

    prelude Occasional Visitor

    Joined:
    Sep 26, 2013
    Messages:
    12
    What settings can cause FA to be disabled? Something is knocking it out on my AC68P, leaving only CTF enabled. I disabled STP after reading Soul--Reaver's post, but it didn't re-enable it for me.
     
  20. john9527

    john9527 Part of the Furniture

    Joined:
    Mar 28, 2014
    Messages:
    6,096
    Location:
    United States
    Did you reboot after changing STP? CTF status usually needs a reboot to be updated.
    A couple more things that can disable CTF-FA are PPPoE, PPTP or L2TP connections.