What's new

OpenSSL hardware acceleration

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

Braswell N3700 - x86-64 numbers (this CPU does support AESNI) - Braswell is a Intel Low Power core - base is 1.6Ghz with Turbo to 2.4 - it's an Airmont core, which is a die shrink from Silvermont's 22nm

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 58432.80k 109802.43k 150152.45k 167195.65k 172010.15k
aes-256-gcm 53050.14k 96761.39k 127607.04k 140015.96k 143461.03k

I'd be curious to see how Annapurna Labs Alpine compares - they're very similar...

Not quite correct comparison. Here (single thread test) we have Turbo-boost working AFAIK. I.e. comparison of 2.4 vs 1.7. Not 1.6 vs 1.7.

Voxel.
 
What about your firmware for the Netgear R7800 - is it compiled with AES HW support?
If not, could you please put it on your never ending list of "x-mas wishes"?
 
Makes sense that 256b/512b wide instructions that occupy the bus stall the entire width of the pipeline, and don't give significant gains for <256b.
In general routers should focus on <256b, and maximum single-threaded performance.
 
What about your firmware for the Netgear R7800 - is it compiled with AES HW support?
If not, could you please put it on your never ending list of "x-mas wishes"?

It is already in my wish-list ;).

Unfortunately Linux codes for IPQ806x used by Netgear are outdated. In general .dissent (thanks to him) succeeded to find the patch for IPQ806x to add crypto driver to kernel for IPQ806x. But API functions which it provides are used nowhere. So I am not sure it is feasible to do the same as for R9000, or this will require a lot of time...

Voxel.
 
Makes sense that 256b/512b wide instructions that occupy the bus stall the entire width of the pipeline, and don't give significant gains for <256b.
In general routers should focus on <256b, and maximum single-threaded performance.

OpenSSL is used only few applications (in this firmware). Practically we talk only about acceleration of OpenVPN. And tests show that use of HW acceleration speeds it up. Thanks to jrfaulkin who tested and his download speed is increased from 61 to 86. My tests show increase of speed too.

Voxel.
 
OpenSSL is used only few applications (in this firmware). Practically we talk only about acceleration of OpenVPN. And tests show that use of HW acceleration speeds it up. Thanks to jrfaulkin who tested and his download speed is increased from 61 to 86. My tests show increase of speed too.

Voxel.

Are you going to release a beta for testing?
 
Braswell N3700 - x86-64 numbers (this CPU does support AESNI) - Braswell is a Intel Low Power core - base is 1.6Ghz with Turbo to 2.4 - it's an Airmont core, which is a die shrink from Silvermont's 22nm

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-gcm 58432.80k 109802.43k 150152.45k 167195.65k 172010.15k
aes-256-gcm 53050.14k 96761.39k 127607.04k 140015.96k 143461.03k
Hmmm.....I'm building a pfSense box with the same N3700 (Supermicro X11SBA-LN4F) and am doing considerably better.

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-gcm     119496.24k   229246.21k   318944.51k   357219.67k   368412.41k
aes-256-gcm     109787.79k   202527.70k   271294.36k   299291.31k   308054.11k
 
Hmmm.....I'm building a pfSense box with the same N3700 (Supermicro X11SBA-LN4F) and am doing considerably better.

He possibly didn't use -evp, so openssl used a more generic code path rather than an optimized one.
 
Using the envelope... openssl speed -evp (cipher) - nice chips...

on linux, ubuntu - n3700 @ 2,4GHz
aes-128-gcm 125143.34k 235311.19k 322237.70k 358792.19k 369262.59k
aes-256-gcm 113730.96k 207544.85k 273746.52k 300200.16k 308789.08k

freebsd 11 - pfsense 2.4 on C2358 @ 1.7GHz
aes-128-gcm 83701.30k 166811.09k 231982.76k 260226.05k 268369.92k
aes-256-gcm 77744.53k 147206.29k 197814.53k 218108.93k 223985.66k
 
Using the envelope... openssl speed -evp (cipher) - nice chips...

on linux, ubuntu - n3700 @ 2,4GHz
aes-128-gcm 125143.34k 235311.19k 322237.70k 358792.19k 369262.59k
aes-256-gcm 113730.96k 207544.85k 273746.52k 300200.16k 308789.08k

freebsd 11 - pfsense 2.4 on C2358 @ 1.7GHz
aes-128-gcm 83701.30k 166811.09k 231982.76k 260226.05k 268369.92k
aes-256-gcm 77744.53k 147206.29k 197814.53k 218108.93k 223985.66k

What are results for aes-128-cbc and aes-256-cbc? GCM hardware acceleration is not supported by AL-514 (yet?). With -evp of course.

Voxel.
 
What are results for aes-128-cbc and aes-256-cbc? GCM hardware acceleration is not supported by AL-514 (yet?). With -evp of course.

Voxel.

N3700 - Ubuntu 16.04
aes-128-cbc 213163.00k 320942.53k 385059.75k 403864.92k 410457.43k
aes-256-cbc 177587.89k 243503.32k 281791.15k 292557.48k 296588.63k

C2358 - FreeBSD, pfSense 2.4
aes-128-cbc 152862.52k 234722.43k 282755.58k 298132.82k 302601.56k
aes-256-cbc 124172.31k 177408.00k 206398.04k 215052.29k 217587.71k


FWIW - the new Broadcom B53 based core is putting up some interesting numbers...
 
N3700 - Ubuntu 16.04
aes-128-cbc 213163.00k 320942.53k 385059.75k 403864.92k 410457.43k
aes-256-cbc 177587.89k 243503.32k 281791.15k 292557.48k 296588.63k

C2358 - FreeBSD, pfSense 2.4
aes-128-cbc 152862.52k 234722.43k 282755.58k 298132.82k 302601.56k
aes-256-cbc 124172.31k 177408.00k 206398.04k 215052.29k 217587.71k


FWIW - the new Broadcom B53 based core is putting up some interesting numbers...

Interesting...

AL-514 1.7GHz in my HW version (with /dev/crypto)
aes-128-cbc 1321.71k 5340.99k 21316.01k 67087.36k 332368.55k
aes-256-cbc 1313.02k 5265.17k 20732.59k 70701.06k 305261.23k


Well, for 8k blocks it is comparable or even faster...

Voxel.
 
Interesting...

AL-514 1.7GHz in my HW version (with /dev/crypto)
aes-128-cbc 1321.71k 5340.99k 21316.01k 67087.36k 332368.55k
aes-256-cbc 1313.02k 5265.17k 20732.59k 70701.06k 305261.23k


Well, for 8k blocks it is comparable or even faster...

Voxel.

How are the sha256 numbers? While aes-cbc numbers are one aspect of VPN, also need to consider the hashing numbers, and this is where GCM is actually preferred - which is why OpenVPN 2.4 introduced that cipher...

n3700
sha256 17963.44k 44136.55k 82532.01k 105998.34k 115613.70k
 
How are the sha256 numbers? While aes-cbc numbers are one aspect of VPN, also need to consider the hashing numbers, and this is where GCM is actually preferred - which is why OpenVPN 2.4 introduced that cipher...

n3700
sha256 17963.44k 44136.55k 82532.01k 105998.34k 115613.70k

I know re: GCM. But until now most of OpenVPN providers use CBC. And GCM is really not supported (there are comments in Alpine hw driver codes that GCM is not implemented).

sha256 519.32k 2087.42k 6651.48k 24165.03k 153862.14k

Voxel.
 
Things can get a bit crazy on the BCM4906 when I enable the BCM HW engine through cryptodev... AES-128-CBC:

Code:
Evp+Eng     24085.54k    138212.27k  398918.40k  2079193.60k  9392713.14k

Note that you don't need cryptodev for AES acceleration at the CPU level - only when using a dedicated hardware engine.
 
How are the sha256 numbers? While aes-cbc numbers are one aspect of VPN, also need to consider the hashing numbers, and this is where GCM is actually preferred - which is why OpenVPN 2.4 introduced that cipher...

n3700
sha256 17963.44k 44136.55k 82532.01k 105998.34k 115613.70k

I didn't test it in my original BCM4906 tests, here they are:

Code:
admin@RT-AC86U-DFD8:/tmp/home/root# openssl speed -evp sha256
Doing sha256 for 3s on 16 size blocks: 3492831 sha256's in 3.00s
Doing sha256 for 3s on 64 size blocks: 3227822 sha256's in 2.99s
Doing sha256 for 3s on 256 size blocks: 2543687 sha256's in 2.99s
Doing sha256 for 3s on 1024 size blocks: 1381730 sha256's in 2.98s
Doing sha256 for 3s on 8192 size blocks: 265112 sha256's in 2.98s
OpenSSL 1.0.2j  26 Sep 2016
built on: reproducible build, date unspecified
options:bn(64,32) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: /opt/toolchains/crosstools-arm-gcc-5.3-linux-4.1-glibc-2.22-binutils-2.25/usr/bin/arm-buildroot-linux-gnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DOPENSSL_NO_HEARTBEATS -DL_ENDIAN -march=armv7-a -fomit-frame-pointer -mabi=aapcs-linux -marm -ffixed-r8 -msoft-float -D__ARM_ARCH_7A__ -ffunction-sections -fdata-sections -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           18628.43k    69090.50k   217787.25k   474795.81k   728791.11k

That's just CPU optimized, the BCM hw engine isn't enabled (I have to do a special build for that).
 
Things can get a bit crazy on the BCM4906 when I enable the BCM HW engine through cryptodev...

IMO it is better to use additional option -elapsed for such tests with cryptodev. W/o this options my results are much more crazy:

aes-128-cbc 15229.18k 52258.87k 355460.52k 1426271.09k 9565634.56k

vs

aes-128-cbc 1321.71k 5340.99k 21316.01k 67087.36k 332368.55k

with -elapsed option.

-elapsed: You have chosen to measure elapsed time instead of user CPU time.

Voxel.
 
IMO it is better to use additional option -elapsed for such tests with cryptodev. W/o this options my results are much more crazy:

aes-128-cbc 15229.18k 52258.87k 355460.52k 1426271.09k 9565634.56k

vs

aes-128-cbc 1321.71k 5340.99k 21316.01k 67087.36k 332368.55k

with -elapsed option.

-elapsed: You have chosen to measure elapsed time instead of user CPU time.

Voxel.

I remember also testing with -elapsed at the time, but I didn't note down the results.
 

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top