1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice

Welcome To SNBForums

SNBForums is a community for anyone who wants to learn about or discuss the latest in wireless routers, network storage and the ins and outs of building and maintaining a small network.

If you'd like to post a question, simply register and have at it!

While you're at it, please check out SmallNetBuilder for product reviews and our famous Router Charts, Ranker and plenty more!

OpenSSL Bench for fun and performance - client and server

Discussion in 'VPN' started by sfx2000, Feb 3, 2019.

  1. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,023
    Location:
    San Diego, CA
    I was digging into something else - OpenVPN on MIPS 24kc on Atheros (MIPS big endian as compared to little endian on AsusWRT Broadcom MIPS targets)

    Target was AR9531 (Atheros) where OpenWRT deliberately disabled a critical function as a default...

    Code:
    [email protected]:~/builds/openwrt$ less .config | grep FPU
    CONFIG_KERNEL_MIPS_FPU_EMULATOR=y
    
    I fixed it, but this has a lot of impact downstream...

    The FPU emulator on MIPS should be active whether a real FPU is present or not - OpenWRT disabled it by default to save 54K of memory on the tight platforms...

    FWIW - folks that are considering MIPS platforms - this includes MIPSEL on Broadcom, MIPS on ATHEROS, MIPS on LANTIQ (now Intel) - leave the emulator on...

    What's fun is looking at OpenSSL across different platforms - source and sink...

    Testing shows that AES-128-CBC and SHA256 are the most performant across ARM/x86/MIPS at the moment... and RSA keying is a good thing, most processors there did good...

    (future is GCM, IMHO, but this will take a while...)

    openssl speed sha256 aes-128-cbc rsa2048

    Results follow...

    QCA9531 - MIPS 24kc (big endian) - 650MHz on OpenWRT 18.06

    Code:
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128 cbc       7891.59k     8721.24k     8991.01k     9003.99k     9050.61k
    sha256            3145.35k     7278.21k    12764.68k    15733.73k    16818.60k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.137121s 0.003614s      7.3    276.7
    

    Raspberry Pi Zero-W - ARM v6 (BCM2835) @ 1.0GHz

    Code:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc      19697.87k    22031.34k    22823.73k    22889.81k    23058.56k    22718.77k
    sha256            4990.12k    13028.05k    23174.51k    28694.53k    30910.42k    30976.68k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.054945s 0.001491s     18.2    670.5
    
    Note - interesting to see that ARM and MIPS are a bit similar here...

    Raspberry Pi 3 B+ - ARM Cortex A53 running in 32 bit mode (Raspbian) @ 1.4GHz

    Code:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc      48330.32k    53997.91k    56285.10k    56886.27k    56494.76k    56295.42k
    sha256           12744.65k    33706.03k    61343.40k    77618.86k    84140.03k    84645.21k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.023585s 0.000622s     42.4   1608.3
    
    And now let's see what A53 can do in ARMv8 space...

    FriendlyARM NanoPI Neo2 - Allwinner H5 Cortex-A53 @ 1.0GHz

    Code:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc      33395.10k    36085.53k    37153.54k    37431.98k    37505.71k    37481.13k
    sha256           34474.37k   104691.37k   244751.87k   369565.01k   435601.41k   441215.66k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.010763s 0.000288s     92.9   3470.4
    
    Intel as a reference...

    N3700 @ 2.4GHz - little Intel cores - Linux

    Code:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc      38659.71k    43212.03k    44901.72k   117979.82k   120949.42k   121077.76k
    sha256           23043.23k    50909.12k    88007.08k   108288.34k   115974.14k   116830.52k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.004847s 0.000140s    206.3   7127.4
    

    pfSense - FreeBSD Intel Atom C2358 @ 1.74GHz

    It's my router, but not my VPN endpoint - but decent numbers considering it's BSD

    Code:
    The 'numbers' are in 1000s of bytes per second processed.
    
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128 cbc      28043.75k    31571.65k    32631.45k    87574.92k    89235.73k
    sha256           15361.50k    36439.99k    63628.78k    78649.74k    84003.81k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.006765s 0.000196s    147.8   5102.1
    

    Core i5-7260U - Intel NUC @ 2.4GHz

    Big cores - and they can turbo up....

    Code:
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc     137128.85k   151120.70k   154798.08k   156284.25k   156835.84k   156811.26k
    sha256           82472.64k   181581.70k   330561.79k   410323.29k   442116.78k   444536.15k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.000610s 0.000018s   1639.0  57105.7
    
    @kvic and @Xentrk - bit of discussion here... I'm still a big fan of AES-128-GCM as I do think that's the best cipher-hmac here... client and server side.
     
    Last edited: Feb 3, 2019
    umarmung, L&LD and Internet Man like this.
  2. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,023
    Location:
    San Diego, CA
    Anyways - a bit fun here getting under the wires...

    The QC/Atheros MIPS based WiSoC's are a bit old-school -- MIPS big endian is network native, so make sense as IP is big-endian considering all things. The Broadcom MIPSEL - little endian, so byte swaps here have about 10-15 percent performance impact overall - can pull some performance back perhaps...

    @RMerlin - https://www.linksysinfo.org/index.php?threads/using-fpu-emulator.69940/#post-244509

    Current GCC (7.1) is pretty good here...
     
  3. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    30,382
    Location:
    Canada
    Manufacturers just need to stop being lazy, and upgrade to OpenVPN 2.4. There are already a few VPN tunnel providers that support GCM.
     
  4. Xentrk

    Xentrk Part of the Furniture

    Joined:
    Jul 21, 2016
    Messages:
    2,225
    Location:
    The Land of Smiles
    @sfx2000
    Thank you for taking the time to post the VPN/OpenSSL metrics. Interesting topic that always peaks my interest.
     
  5. kvic

    kvic Part of the Furniture

    Joined:
    Aug 11, 2014
    Messages:
    2,444
    Location:
    22.4399N 114.2222E


    I said so two years ago: GCM vs CBC

    Also OpenVPN should be retired, given the choice of much faster IPsec, very light weight and flexible Shadowsocks, and now fast and flexible Wireguard.

    Thanks for sharing your numbers.
     
    heysoundude likes this.
  6. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,023
    Location:
    San Diego, CA
    NP - it was fun working with you to validate the assumptions and measurements for that article...

    Offhand - https://kazoo.ga/optimised-openssl-library-erx/

    I see that you have the .deb file - were there tweaks outside of just rebuilding with the Ubnt toolchain?
     
  7. kvic

    kvic Part of the Furniture

    Joined:
    Aug 11, 2014
    Messages:
    2,444
    Location:
    22.4399N 114.2222E
    Turned a few knobs, fixed some codes mistakenly merged by UBNT or not merged by UBNT. That's about it. So basically UBNT was doing a lousy job.

    As an aside, after two years in baking, UBNT released a buggy 2.0.0 FW (at least for the ERX/MediaTek platform though it comes with a newer kernel than Cavium platform). 2.0.0 ships with OpenSSL 1.1.0 as it's based on Debian Stretch. So everything starts from scratch albeit with a better baseline if UBNT hasn't made mistakes yet.
     
  8. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    14,023
    Location:
    San Diego, CA
    This one is a bit odd... Qualcomm IPQ4028 - ARMv7a - [email protected] 710MHz...

    Code:
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128 cbc       9467.39k     9751.40k     9874.66k     9883.65k     9915.33k
    sha256            1529.10k     5086.18k     8906.50k    11145.85k    11954.86k
                      sign    verify    sign/s verify/s
    rsa 2048 bits 0.083250s 0.002068s     12.0    483.6