What's new

OpenSSL Bench for fun and performance - client and server

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

sfx2000

Part of the Furniture
I was digging into something else - OpenVPN on MIPS 24kc on Atheros (MIPS big endian as compared to little endian on AsusWRT Broadcom MIPS targets)

Target was AR9531 (Atheros) where OpenWRT deliberately disabled a critical function as a default...

Code:
sfx@blue:~/builds/openwrt$ less .config | grep FPU
CONFIG_KERNEL_MIPS_FPU_EMULATOR=y

I fixed it, but this has a lot of impact downstream...

The FPU emulator on MIPS should be active whether a real FPU is present or not - OpenWRT disabled it by default to save 54K of memory on the tight platforms...

FWIW - folks that are considering MIPS platforms - this includes MIPSEL on Broadcom, MIPS on ATHEROS, MIPS on LANTIQ (now Intel) - leave the emulator on...

What's fun is looking at OpenSSL across different platforms - source and sink...

Testing shows that AES-128-CBC and SHA256 are the most performant across ARM/x86/MIPS at the moment... and RSA keying is a good thing, most processors there did good...

(future is GCM, IMHO, but this will take a while...)

openssl speed sha256 aes-128-cbc rsa2048

Results follow...

QCA9531 - MIPS 24kc (big endian) - 650MHz on OpenWRT 18.06

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc       7891.59k     8721.24k     8991.01k     9003.99k     9050.61k
sha256            3145.35k     7278.21k    12764.68k    15733.73k    16818.60k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.137121s 0.003614s      7.3    276.7

Raspberry Pi Zero-W - ARM v6 (BCM2835) @ 1.0GHz

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      19697.87k    22031.34k    22823.73k    22889.81k    23058.56k    22718.77k
sha256            4990.12k    13028.05k    23174.51k    28694.53k    30910.42k    30976.68k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.054945s 0.001491s     18.2    670.5

Note - interesting to see that ARM and MIPS are a bit similar here...

Raspberry Pi 3 B+ - ARM Cortex A53 running in 32 bit mode (Raspbian) @ 1.4GHz

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      48330.32k    53997.91k    56285.10k    56886.27k    56494.76k    56295.42k
sha256           12744.65k    33706.03k    61343.40k    77618.86k    84140.03k    84645.21k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.023585s 0.000622s     42.4   1608.3

And now let's see what A53 can do in ARMv8 space...

FriendlyARM NanoPI Neo2 - Allwinner H5 Cortex-A53 @ 1.0GHz

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      33395.10k    36085.53k    37153.54k    37431.98k    37505.71k    37481.13k
sha256           34474.37k   104691.37k   244751.87k   369565.01k   435601.41k   441215.66k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.010763s 0.000288s     92.9   3470.4

Intel as a reference...

N3700 @ 2.4GHz - little Intel cores - Linux

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      38659.71k    43212.03k    44901.72k   117979.82k   120949.42k   121077.76k
sha256           23043.23k    50909.12k    88007.08k   108288.34k   115974.14k   116830.52k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.004847s 0.000140s    206.3   7127.4

pfSense - FreeBSD Intel Atom C2358 @ 1.74GHz

It's my router, but not my VPN endpoint - but decent numbers considering it's BSD

Code:
The 'numbers' are in 1000s of bytes per second processed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      28043.75k    31571.65k    32631.45k    87574.92k    89235.73k
sha256           15361.50k    36439.99k    63628.78k    78649.74k    84003.81k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.006765s 0.000196s    147.8   5102.1

Core i5-7260U - Intel NUC @ 2.4GHz

Big cores - and they can turbo up....

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     137128.85k   151120.70k   154798.08k   156284.25k   156835.84k   156811.26k
sha256           82472.64k   181581.70k   330561.79k   410323.29k   442116.78k   444536.15k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000610s 0.000018s   1639.0  57105.7

@kvic and @Xentrk - bit of discussion here... I'm still a big fan of AES-128-GCM as I do think that's the best cipher-hmac here... client and server side.
 
Last edited:
Anyways - a bit fun here getting under the wires...

The QC/Atheros MIPS based WiSoC's are a bit old-school -- MIPS big endian is network native, so make sense as IP is big-endian considering all things. The Broadcom MIPSEL - little endian, so byte swaps here have about 10-15 percent performance impact overall - can pull some performance back perhaps...

@RMerlin - https://www.linksysinfo.org/index.php?threads/using-fpu-emulator.69940/#post-244509

Current GCC (7.1) is pretty good here...
 
(future is GCM, IMHO, but this will take a while...)

Manufacturers just need to stop being lazy, and upgrade to OpenVPN 2.4. There are already a few VPN tunnel providers that support GCM.
 
@sfx2000
Thank you for taking the time to post the VPN/OpenSSL metrics. Interesting topic that always peaks my interest.
 
I said so two years ago: GCM vs CBC

Also OpenVPN should be retired, given the choice of much faster IPsec, very light weight and flexible Shadowsocks, and now fast and flexible Wireguard.

Thanks for sharing your numbers.

NP - it was fun working with you to validate the assumptions and measurements for that article...

Offhand - https://kazoo.ga/optimised-openssl-library-erx/

I see that you have the .deb file - were there tweaks outside of just rebuilding with the Ubnt toolchain?
 
NP - it was fun working with you to validate the assumptions and measurements for that article...

Offhand - https://kazoo.ga/optimised-openssl-library-erx/

I see that you have the .deb file - were there tweaks outside of just rebuilding with the Ubnt toolchain?

Turned a few knobs, fixed some codes mistakenly merged by UBNT or not merged by UBNT. That's about it. So basically UBNT was doing a lousy job.

As an aside, after two years in baking, UBNT released a buggy 2.0.0 FW (at least for the ERX/MediaTek platform though it comes with a newer kernel than Cavium platform). 2.0.0 ships with OpenSSL 1.1.0 as it's based on Debian Stretch. So everything starts from scratch albeit with a better baseline if UBNT hasn't made mistakes yet.
 
QCA9531 - MIPS 24kc (big endian) - 650MHz on OpenWRT 18.06

Code:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 7891.59k 8721.24k 8991.01k 9003.99k 9050.61k
sha256 3145.35k 7278.21k 12764.68k 15733.73k 16818.60k
sign verify sign/s verify/s
rsa 2048 bits 0.137121s 0.003614s 7.3 276.7

This one is a bit odd... Qualcomm IPQ4028 - ARMv7a - Cortex-A7@ 710MHz...

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc       9467.39k     9751.40k     9874.66k     9883.65k     9915.33k
sha256            1529.10k     5086.18k     8906.50k    11145.85k    11954.86k
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.083250s 0.002068s     12.0    483.6
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top