What's new

REQ - need some OpenSSL numbers/info

  • SNBForums Code of Conduct

    SNBForums is a community for everyone, no matter what their level of experience.

    Please be tolerant and patient of others, especially newcomers. We are all here to share and learn!

    The rules are simple: Be patient, be nice, be helpful or be gone!

sfx2000

Part of the Furniture
I figure this is as good a place as any to ask... if I can borrow two minutes of your time, it'll be much appreciated, and in return, I'll be your best friend for an hour :D

(and I'll owe you a small favor, which is even better)

In an off-board discussion about OpenVPN performance, specifically with OpenSSL and recent patches/updates....

Can someone run the following command and post output - and please, if possible, include the Model and/or Chip used - I'm particularly interested in the ARM's on the more recent Broadcom System on Chips...

openssl speed aes-128-cbc aes-256-cbc bf-cbc

It's important to note the version/build date/compiler options with the output, so best just to have like below;

Thx!

Code:
Doing aes-128 cbc for 3s on 16 size blocks: 7990418 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 2225927 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 581658 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 286350 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 36375 aes-128 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 5890953 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1604381 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 417020 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 211278 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 26734 aes-256 cbc's in 3.00s
Doing blowfish cbc for 3s on 16 size blocks: 15152484 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 64 size blocks: 4076668 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 256 size blocks: 1036293 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 260825 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 32664 blowfish cbc's in 3.00s

OpenSSL 1.0.1s  1 Mar 2016

built on: Thu Mar 10 20:24:54 2016

options:bn(64,32) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx) 

compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -O3 -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     80813.25k    86968.92k    88430.34k    89028.27k    89194.50k
aes-128 cbc      42615.56k    47486.44k    49634.82k    97740.80k    99328.00k
aes-256 cbc      31418.42k    34226.79k    35585.71k    72116.22k    73001.64k
 
Code:
Model: D-Link DIR-885L
Chip:  Broadcom bcm47094 (1.4GHz)
Firmware: dd-wrt 29300M (kongac)
Kernel: 3.10.101

root@DD-WRT:~# openssl speed aes-128-cbc aes-256-cbc bf-cbc
WARNING: can't open config file: /etc/ssl/openssl.cnf
Doing aes-128 cbc for 3s on 16 size blocks: 8498911 aes-128 cbc's in 3.02s
Doing aes-128 cbc for 3s on 64 size blocks: 2283649 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 585990 aes-128 cbc's in 3.01s
Doing aes-128 cbc for 3s on 1024 size blocks: 147371 aes-128 cbc's in 3.01s
Doing aes-128 cbc for 3s on 8192 size blocks: 18444 aes-128 cbc's in 3.01s
Doing aes-256 cbc for 3s on 16 size blocks: 6502208 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 64 size blocks: 1757422 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 256 size blocks: 450878 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 1024 size blocks: 113002 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 8192 size blocks: 14150 aes-256 cbc's in 3.01s
Doing blowfish cbc for 3s on 16 size blocks: 7336504 blowfish cbc's in 3.01s
Doing blowfish cbc for 3s on 64 size blocks: 2027160 blowfish cbc's in 3.01s
Doing blowfish cbc for 3s on 256 size blocks: 522895 blowfish cbc's in 3.01s
Doing blowfish cbc for 3s on 1024 size blocks: 131687 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 16556 blowfish cbc's in 3.02s

OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified

compiler: ccache arm-linux-uclibc-gcc -I/opt/DEV/src/router/zlib -L/opt/DEV/src/router/zlib -I/opt/DEV/src/router/openssl/crypto -fPIC -I. -I.. -I../include  -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Os -pipe -march=armv7-a -mcpu=cortex-a9 -mtune=cortex-a9 -msoft-float -mfloat-abi=soft -fno-caller-saves -fno-plt -DASMAES512 -ffunction-sections -fdata-sections -I/opt/DEV/src/router/zlib -DNDEBUG -DOPENSSL_NO_ERR -DTERMIO  -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
blowfish cbc  38998.03k  43102.41k  44472.13k  44949.16k  44909.52k
aes-128 cbc  45027.34k  48717.85k  49838.35k  50135.52k  50197.09k
aes-256 cbc  34563.23k  37367.11k  38347.10k  38443.21k  38510.56k
 
Last edited:
Just to provide some alternative data points showing the kind of performance improvements that OpenSSL can gain through optimization, here are the various data points I've taken throughout the years as I was optimizing OpenSSL, starting originally from the basic 1.0.0 with no optimization beyond -oS, up to when I was compiling with -O3, with various ASM backports I did from the 1.0.2 branch, on top of 1.0.0.

Asuswrt-merlin (and now Asuswrt as well) have since moved to 1.0.2, but we both use different optimizations (I believe they still use -Os).

This is a dump of what I keep in Evernote, so excuse the formatting. These were an RT-N66 or RT-AC66, so MIPS 600 MHz.

Code:
OpenSSL optimization:
Before:
SHA1:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1  1557.77k  4110.98k  8128.21k  10853.72k  12018.60k
AES:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  8529.25k  9151.03k  9305.85k  9335.01k  9347.56k
aes-192 cbc  7582.44k  8048.25k  8174.88k  8197.76k  8205.56k
aes-256 cbc  6814.37k  7191.08k  7288.54k  7302.95k  7310.41k

After:
SHA1:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1  2151.79k  6328.24k  14196.56k  20562.67k  23740.52k
AES:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  9398.56k  9941.17k  10138.71k  10195.09k  10234.54k
aes-192 cbc  8309.07k  8690.73k  8819.37k  8859.30k  8853.87k
aes-256 cbc  7394.64k  7720.69k  7820.29k  7853.94k  7855.64k
With -O3 + basic mips32r2 (374_33 b3)
SHA1:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1  2272.92k  6520.70k  14523.76k  20749.53k  23866.76k
AES:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  10029.65k  10720.32k  10932.22k  10999.41k  11075.69k
aes-192 cbc  8771.80k  9198.58k  9463.30k  9446.66k  9502.72k
aes-256 cbc  7781.14k  8156.80k  8280.88k  8301.25k  8331.26k


With more backports from 1.0.2, adding mips32r2 asm (374_33 b4):
SHA1:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1  2603.08k  7942.18k  19546.62k  30650.83k  36714.31k
AES:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  10075.88k  10765.01k  10974.69k  11032.23k  11109.55k
aes-192 cbc  8655.11k  9258.02k  9440.60k  9523.20k  9554.60k
aes-256 cbc  7794.37k  8221.80k  8371.20k  8340.98k  8361.30k
RT-AC87U (stock clock), 378_53_alpha3:
SHA1:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1  8352.36k  23409.21k  47876.97k  65298.77k  73541.07k

AES:
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  31809.18k  36015.03k  37265.15k  37824.09k  37901.10k
aes-192 cbc  26602.22k  29972.51k  31031.22k  31277.55k  31219.71k
aes-256 cbc  23582.21k  26372.97k  27076.58k  27249.70k  27744.13k


OpenVPN throughput benchmark:
iperf -c 10.16.0.1 -M 1400 -N -l 64K -t 30



=== 3.0.0.4.270.24:
AES-128-CBC [152]   0.0-30.0 sec  69.9 MBytes  19.5 Mbits/sec
=== 3.0.0.4.270.25 (with openvpn + openssl + lzo optim):
AES-128-CBC [152]  0.0-30.0 sec  79.5 MBytes  22.2 Mbits/sec
=== 3.0.0.4.374_32:
AES-128-CBC           0.0-30.0 sec  84.8 MBytes  23.7 Mbits/sec
=== 3.0.0.4.374.33_Alpha2 (with mips32r2)
AES-128-CBC         0.0-30.1 sec  93.8 MBytes  26.1 Mbits/sec

=== 3.0.0.4.367.28 ARM (800 MHz):
AES_128-CBC [156]  0.0-30.0 sec  217 MBytes  60.7 Mbits/sec

And a second data dump, focusing more on the impact of CPU/RAM clocks. This is 1.0.0 + my 1.0.2 backports. The iperf benchmarks are taken through an OpenVPN tunnel, to get some real-life performance out of the OpenSSL optimizations:

And at the end, the difference between 1.0.0+backports and 1.0.2.

Code:
RT-AC68U

iperf -c 192.168.10.130 -M 1400 -N -l 64K -t 30
openssl speed aes-128-cbc

Bootloader upgraded from 1.0.1.1 to 1.0.1.6

OpenSSL 1.0.0j 10 May 2012
built on: Tue Oct 29 19:45:57 EDT 2013
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: arm-brcm-linux-uclibcgnueabi-gcc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -ffunction-sections -fdata-sections -DTERMIO -O3 -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DGHASH_ASM


CPU: 800 MHz
DDR: 533 MHz

[152] local 192.168.1.100 port 1841 connected with 192.168.10.130 port 5001
[ ID] Interval  Transfer  Bandwidth
[152]  0.0-30.0 sec  1.16 GBytes  333 Mbits/sec

The 'numbers' are in 1000s of bytes per second processed.
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  25299.61k  27983.66k  28966.83k  29250.56k  29392.90k


CPU: 1 GHz (through Turbo mode)
DDR: 533 MHz

[152] local 192.168.1.100 port 2086 connected with 192.168.10.130 port 5001
[ ID] Interval  Transfer  Bandwidth
[152]  0.0-30.0 sec  1.30 GBytes  371 Mbits/sec

The 'numbers' are in 1000s of bytes per second processed.
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  31755.66k  35012.59k  36352.85k  36820.51k  36711.59k


CPU: 800 MHz
DDR: 666 MHz

[152] local 192.168.1.100 port 2301 connected with 192.168.10.130 port 5001
[ ID] Interval  Transfer  Bandwidth
[152]  0.0-30.0 sec  1.23 GBytes  351 Mbits/sec

The 'numbers' are in 1000s of bytes per second processed.
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
aes-128 cbc  25468.94k  28007.64k  29075.80k  29366.61k  29455.70k


CPU: 1 GHz (through Turbo mode)
DDR: 666 MHz

<Router failed to boot>


378.53 (1.0.0r)
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1                 6657.84k      18480.87k  38267.31k  52379.48k  58664.58k
sha256             7292.01k      16029.24k  27720.52k  33114.93k  36104.87k
aes-128 cbc  25455.22k  28945.91k  29937.44k  30150.66k  30296.70k
aes-192 cbc  21239.83k  23843.20k  24710.49k  25044.85k  25026.56k
aes-256 cbc  18842.44k  21123.35k  21936.79k  21793.85k  21975.93k
378.54 alpha 1 (1.0.2a)
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
sha1                6572.39k  18081.40k  37041.14k  52046.51k  58714.79k
sha256  7229.63k  15686.31k  27447.41k  33727.06k  35880.96k
aes-128 cbc  25725.52k  28116.52k  29175.44k  29477.84k  29385.06k
aes-192 cbc  21949.43k  23958.81k  24872.11k  25073.32k  25198.59k
aes-256 cbc  19475.85k  21012.33k  21716.05k  21996.28k  21878.10k


And for a current datapoint, which I've just taken:

Code:
[b]RT-AC88U - 1.4 GHz BCM4709c0[/b]

ASUSWRT-Merlin RT-AC88U_380.59 Sat Mar 26 06:28:50 UTC 2016
admin@Stargate88:/tmp/home/root# openssl speed aes-128-cbc aes-256-cbc bf-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 8398602 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 2301471 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 596376 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 1024 size blocks: 150403 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 18847 aes-128 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 6381194 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1708985 aes-256 cbc's in 2.95s
Doing aes-256 cbc for 3s on 256 size blocks: 443242 aes-256 cbc's in 2.97s
Doing aes-256 cbc for 3s on 1024 size blocks: 111763 aes-256 cbc's in 2.97s
Doing aes-256 cbc for 3s on 8192 size blocks: 13985 aes-256 cbc's in 2.96s
Doing blowfish cbc for 3s on 16 size blocks: 6875500 blowfish cbc's in 2.98s
Doing blowfish cbc for 3s on 64 size blocks: 1989128 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 256 size blocks: 514212 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 130022 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 16303 blowfish cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: arm-brcm-linux-uclibcgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -ffunction-sections -fdata-sections -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     36915.44k    42434.73k    43879.42k    44380.84k    44518.06k
aes-128 cbc      44942.35k    49098.05k    51060.96k    51337.56k    51464.87k
aes-256 cbc      34033.03k    37076.28k    38205.37k    38533.78k    38704.43k
 
Very helpful - thx - there's a reason for asking, can't say much at the moment, except that the person is deep into ARMv7 (32bit)/v8 (64bit) assembler, and a total ninja at it - won't fix everything around OVPN, but with some other efforts, it might help.
 
chip: 4708
RT-AC68U 1Ghz (oc)
380_57.7

Code:
Doing aes-128 cbc for 3s on 16 size blocks: 5994672 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 64 size blocks: 1646982 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 256 size blocks: 426322 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 107488 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 13485 aes-128 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 4550840 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1221630 aes-256 cbc's in 2.97s
Doing aes-256 cbc for 3s on 256 size blocks: 317807 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 1024 size blocks: 80072 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 10034 aes-256 cbc's in 3.01s
Doing blowfish cbc for 3s on 16 size blocks: 4923778 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 64 size blocks: 1420668 blowfish cbc's in 2.99s
Doing blowfish cbc for 3s on 256 size blocks: 367648 blowfish cbc's in 3.01s
Doing blowfish cbc for 3s on 1024 size blocks: 92860 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 11623 blowfish cbc's in 2.99s
OpenSSL 1.0.2f  28 Jan 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-brcm-linux-uclibcgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -ffunction-sections -fdata-sections -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type  16 bytes  64 bytes  256 bytes  1024 bytes  8192 bytes
blowfish cbc  26260.15k  30408.95k  31268.40k  31696.21k  31844.69k
aes-128 cbc  32186.16k  35253.13k  36379.48k  36811.94k  36823.04k
aes-256 cbc  24271.15k  26324.69k  27029.43k  27422.65k  27308.48k
 
Last edited:
BCM47XX
AC68U 1.2Ghz
380.57

Code:
ASUSWRT-Merlin RT-AC68U_3.0.0.4 Thu Dec 24 18:54:10 UTC 2015
admin@RT-AC68U:/tmp/home/root# openssl speed aes-128-cbc aes-256-cbc bf-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 7192713 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 64 size blocks: 1973435 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 510162 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 1024 size blocks: 128336 aes-128 cbc's in 2.98s
Doing aes-128 cbc for 3s on 8192 size blocks: 16163 aes-128 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 5477549 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1473978 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 381329 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 96109 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 12055 aes-256 cbc's in 3.00s
Doing blowfish cbc for 3s on 16 size blocks: 5903673 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 64 size blocks: 1703082 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 256 size blocks: 440884 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 1024 size blocks: 111357 blowfish cbc's in 3.00s
Doing blowfish cbc for 3s on 8192 size blocks: 13965 blowfish cbc's in 3.00s

OpenSSL 1.0.2e 3 Dec 2015

built on: reproducible build, date unspecified

options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-brcm-linux-uclibcgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -ffunction-sections -fdata-sections -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     31486.26k    36332.42k    37622.10k    38009.86k    38133.76k
aes-128 cbc      38618.59k    42099.95k    43679.42k    44099.35k    44135.77k
aes-256 cbc      29213.59k    31444.86k    32540.07k    32805.21k    32918.19k
 
Thanks to everyone for sharing - I have enough info, and I really appreciate you taking the time to collect and post the data.

Might share something interesting in a bit..
 
Sharing something back - When we see 64bit ARMv8 and the improved crypto support, should see numbers more like this...

1.7GHz dual core - 64 bit, OpenSSL 1.0.1, LLVM/Clang on a 3.18 kernel... and things are not tightly optimized as it could be on that platform, but it's early, and running on core, not a dedicated crypto block - this particular device does DPDK out of the box - so numbers will only get better...

* DPDK isn't just Intel...

DPDK moves a lot of IPTables/PF out of the kernel and lower into the stack, it won't entirely fix OVPN (TUN driver, Core binds, and double packet context shifts, but OVPN team knows this)...

Q3-Q4 2016, we'll probably see some very real improvements in the consumer space... toss this in with NGBase-T, 2016 might be the year of the Router...

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
bf-cbc           40044.02k    49271.87k    52055.98k    52893.01k    53122.39k
aes-128-cbc     155438.35k   237207.45k   283317.42k   297793.54k   301959.85k
aes-256-cbc     128682.54k   178669.89k   206472.11k   214820.18k   217224.53k
 
Sharing something back - When we see 64bit ARMv8 and the improved crypto support, should see numbers more like this...

Are your numbers from one core or combined from two cores...
 
Are your numbers from one core or combined from two cores...

SMP processor, 2 cores, but running as a single thread - note that the AES numbers are great, as that is running on a dedicate crypto block, but the Blowfish is running on CPU...
 
Impressive performance. 64bit ARMv8 something to look forward to.

Put it in perspective. It beats a single thread Sandy Bridge @ 2.7GHz (without AES-NI):

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     92518.94k    93979.87k    97426.18k    98786.30k    91840.51k
aes-128 cbc      97558.55k    99906.50k   106787.84k   251094.36k   243914.07k
aes-256 cbc      70345.79k    72406.50k    75350.70k   183504.21k   182045.97k


As a reference point...same single thread SB with AES-NI (aes-128 only apparently):

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     94007.02k    99097.71k    99285.19k   101838.33k   102893.16k
aes-256 cbc      70037.50k    72649.41k    76104.95k   180769.79k   183577.26k
aes-128-cbc     572714.21k   584414.55k   621179.73k   632523.78k   624186.71k
 
1.7GHz dual core - 64 bit, OpenSSL 1.0.1, LLVM/Clang on a 3.18 kernel... and things are not tightly optimized as it could be on that platform, but it's early, and running on core, not a dedicated crypto block - this particular device does DPDK out of the box - so numbers will only get better...

I challenged the numbers my friend sent - the aes-128-cbc/aes-256-cbc - those are running on the crypto block, not core as I originally posted... the blowfish numbers are on cpu...

It's still impressive - but thinking that they're on a private branch of OpenSSL, at some point, they'll have to push changes back in, or do a private release for their customers.
 
Impressive performance. 64bit ARMv8 something to look forward to.

Put it in perspective. It beats a single thread Sandy Bridge @ 2.7GHz (without AES-NI):

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     92518.94k    93979.87k    97426.18k    98786.30k    91840.51k
aes-128 cbc      97558.55k    99906.50k   106787.84k   251094.36k   243914.07k
aes-256 cbc      70345.79k    72406.50k    75350.70k   183504.21k   182045.97k


As a reference point...same single thread SB with AES-NI (aes-128 only apparently):

Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     94007.02k    99097.71k    99285.19k   101838.33k   102893.16k
aes-256 cbc      70037.50k    72649.41k    76104.95k   180769.79k   183577.26k
aes-128-cbc     572714.21k   584414.55k   621179.73k   632523.78k   624186.71k

On my MacBook Pro - not sure if Apple has pulled all changes in, but the CPU is IvyBridge (2.3GHz i7-3615QM)...

Code:
OpenSSL 0.9.8zh 14 Jan 2016
built on: Feb  4 2016
options:bn(64,64) md2(int) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(idx) 
compiler: -arch x86_64 -fmessage-length=0 -pipe -Wno-trigraphs -fpascal-strings -fasm-blocks -O3 -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DMD32_REG_T=int -DOPENSSL_NO_IDEA -DOPENSSL_PIC -DOPENSSL_THREADS -DZLIB -mmacosx-version-min=10.6
available timing options: TIMEB USE_TOD HZ=100 [sysconf value]
timing function used: getrusage
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
blowfish cbc     99730.61k   104755.15k   102380.58k   107293.71k   103096.68k
aes-128 cbc     148768.84k   145899.78k   147584.40k   155118.92k   157004.96k
aes-256 cbc     116471.55k   114861.31k   111379.39k   114520.19k   111656.70k
 
Code:
OpenSSL 0.9.8zh 14 Jan 2016

A fair amount of optimizations came in 1.0.1 and 1.0.2. I suspect this old build might not be optimal, even on x86.
 
A fair amount of optimizations came in 1.0.1 and 1.0.2. I suspect this old build might not be optimal, even on x86.

Yah, this is my main day to day box, so while I can pull the OpenSSL source and build... probably not worth the hassle there. Besides, it's Saturday afternoon...

And who knows what fixes/changes Apple has pulled in - heck, their BSD user land is seemingly ancient, but generally patched up - go figure...
 
On my MacBook Pro - not sure if Apple has pulled all changes in, but the CPU is IvyBridge (2.3GHz i7-3615QM)...

Similar numbers but slightly higher for my Sandy Bridge @2.7GHz in El Capitan. Apple seems not giving much care for their bundled openssl by staying on 0.9.8. Apparently that serves their intended use cases well enough.

My numbers in #11 are from a VM running Debian Jessie under El Capitan. The aes-128 with AES-NI are too good to believe. I'm less suspicious after seeing this Intel whitepaper: https://software.intel.com/sites/default/files/open-ssl-performance-paper.pdf

Look at Figure 4..
 
1.7GHz dual core - 64 bit, OpenSSL 1.0.1, LLVM/Clang on a 3.18 kernel...

3.18 is a good pick. LLVM very nice. Whoever this chip vendor is..looks set on the right foundation to offer the market competitive surprise.

On the kernel, more importantly is to keep porting subsequent patches (both small and big..critical or medium). This makes the kernel a living being. Not a dead wood as the broadcom contaminated kernel.

On the broadcom dead wood, not only they didn't actively port fixes but they did it so carelessly for 'lucky few' they picked... I could spot some patches were ported incorrectly. It's no surprise there are mysterious reboots in Asus routers.
 
Sharing something back - When we see 64bit ARMv8 and the improved crypto support, should see numbers more like this...
The Cortex-A53 in a Raspberry Pi 3 should have crypto support according to the specifications, but the reality is that it is not implemented in every A53 soc because of the licenses obtained from ARM.

Someone did test it with the conclusion that the Pi 3 doesn't have crypto support.

https://www.reddit.com/r/raspberry_..._runtime_detection_of_cpu_features_pi3_arm53/
 
Similar numbers but slightly higher for my Sandy Bridge @2.7GHz in El Capitan. Apple seems not giving much care for their bundled openssl by staying on 0.9.8. Apparently that serves their intended use cases well enough.

0.9.8 is now EOL, which means no security patches anymore, unless Apple wants to waste resources maintaining it entirely in-house.

Sent from my Nexus 9 using Tapatalk
 
0.9.8 is now EOL, which means no security patches anymore, unless Apple wants to waste resources maintaining it entirely in-house.

Sent from my Nexus 9 using Tapatalk

My guess is that Apple keeps OpenSSL in the BSDUserLand for compatibility reasons - the BSD space in OS X is rather aged these days compared to the other *BSD's - doesn't mean they're totally out of date within the greater context, but the BSD layer is what it is...
 

Similar threads

Latest threads

Sign Up For SNBForums Daily Digest

Get an update of what's new every day delivered to your mailbox. Sign up here!
Top