Entware-3x for new HND platform (GT-AC5300 and RT-AC86U) with asuswrt-merlin firmware

Voxel · Feb 13, 2018

Fitz Mutch said:
OpenSSL ranked from best to worst performance on RT-AC86U.
1. Asuswrt-Merlin (armv7)
2. Entware-ng-3x-armv7
3. Entware-ng-3x-armv8

As I can see Entware versions are compiled with a goal to support /dev/crypto i.e. not only using assember acceleration but also hardware acceleration. I.e. these options for compilation:

Code:

-DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS

So if it is really so, the test should be run with "-evp" and "-elapsed" options to use /dev/crypto:

Code:

openssl speed -evp aes-256-cbc -elapsed

Merlin's version is using assembler optimization. These options -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS slow down the speed of OpenSSL if no /dev/crypto is used.

Voxel.

Fitz Mutch · Feb 13, 2018

Voxel said:
I know some people are using this version not only for Cortex-A15. Main points: it is compiled with hard float and for neon-vfpv4.

My RT-AC86U cpuinfo has "fp asimd evtstrm aes pmull sha1 sha2 crc32". I want to turn on all the bells and whistles. Have you tried "-mfpu=crypto-neon-fp-armv8" ?

Voxel · Feb 13, 2018

zyxmon said:
You can also test Vortex Entware-3x port that is armv7 -O3 hard float. It was 30% faster in a similar test compared with Entware-ng on IPQ4018 (Asus RT-58AC running lede).

IMO you had in mind my version: Voxel. ?

Voxel.

Voxel · Feb 13, 2018

Fitz Mutch said:
My RT-AC86U cpuinfo has "fp asimd evtstrm aes pmull sha1 sha2 crc32". I want to turn on all the bells and whistles. Have you tried "-mfpu=crypto-neon-fp-armv8" ?

No, sorry. I prepare version for Cortex-A15 (32-bit) for users of NETGEAR R7500/R7800/R9000. I do not have armv8 gadgets.

Voxel.

Voxel · Feb 13, 2018

Fitz Mutch said:
I want to turn on all the bells and whistles.

And BTW if it supports /dev/crypto it is most promising bell to ring

.

Voxel.

zyxmon · Feb 13, 2018

Voxel said:
IMO you had in mind my version: Voxel. ?

Yes, sure.

Voxel · Feb 13, 2018

BTW, is there /dev/crypto in Asus RT-58AC and/or RT-86U really? I do not have neither nor to check...

Voxel.

RMerlin · Feb 13, 2018

When I enabled Broadcom's crypto engine on the RT-AC86U, it seriously reduced OpenVPN performance due to the added context switches (I assume). That's why I keep it disabled.

zyxmon · Feb 13, 2018

Voxel said:
BTW, is there /dev/crypto in Asus RT-58AC

No, there is no /dev/crypto (openwrt). Some of my NASes have /dev/encryptfs - hardware encrypted file system support?

Voxel · Feb 13, 2018

RMerlin said:
When I enabled Broadcom's crypto engine on the RT-AC86U, it seriously reduced OpenVPN performance due to the added context switches (I assume). That's why I keep it disabled.

It is interesting. I have rather opposite feedback from users of R9000 (AL-514, Cortex-A15). For example:

https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/#post-335864
(it is just assembler acceleration)

and

https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/page-4#post-338655
(it is when guy tried version with asm plus /dev/crypto)

So 61/19 vs 93/21 for AL-514 (OpenVPN).

Voxel.

Voxel · Feb 13, 2018

zyxmon said:
No, there is no /dev/crypto (openwrt). Some of my NASes have /dev/encryptfs - hardware encrypted file system support?

If so it has no sense to compile OpenSSL with -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS. IMO. As I know these options really slow down the speed. I would suggest to use pure asm acceleration.

/dev/encryptfs != /dev/crypto of course.

Voxel.

RMerlin · Feb 13, 2018

Voxel said:
It is interesting. I have rather opposite feedback from users of R9000 (AL-514, Cortex-A15). For example:

https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/#post-335864
(it is just assembler acceleration)

and

https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/page-4#post-338655
(it is when guy tried version with asm plus /dev/crypto)

So 61/19 vs 93/21 for AL-514 (OpenVPN).

Voxel.

Could be because I'm already optimizing OpenSSL and OpenVPN beyond what Netgear does. I ran iperf tests through an OpenVPN tunnel, and performance dropped. Raw openssl speed tests was also slower on small block sizes - see the benchmarks I posted in the VPN sub-forums.

I suspect IPSEC would be where performance improvements could be gained, but I haven't had time to debug the Strongswan implementation on the RT-AC86U to run tests.

RMerlin · Feb 13, 2018

Voxel said:
If so it has no sense to compile OpenSSL with -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS. IMO. As I know these options really slow down the speed. I would suggest to use pure asm acceleration.

/dev/encryptfs != /dev/crypto of course.

Voxel.

What I did in my tests is to compile with OpenSSL external engine support, then used such an engine to access the kernel cryptodev API.

I don't remember the exact build time change I did tho, I had to manually compile specific pieces and copy them to a running router, as the change would prevent some of the other firmware components from running properly.

Voxel · Feb 13, 2018

RMerlin said:
Could be because I'm already optimizing OpenSSL and OpenVPN beyond what Netgear does.

Sorry, I do not talk re: what Netgear does. This guy who tested OpenVPN (my links above) tried my version with optimized OpenVPN and OpenSSL. Netgear still do not enable any acceleration of OpenSSL in their stock firmware. Even assembler acceleration in spite of my hints passed to NG developers by NETGEAR Guy.

Voxel.

Voxel · Feb 13, 2018

RMerlin said:
What I did in my tests is to compile with OpenSSL external engine support, then used such an engine to access the kernel cryptodev API.

I don't remember the exact build time change I did tho, I had to manually compile specific pieces and copy them to a running router, as the change would prevent some of the other firmware components from running properly.

I've used kernel's code in /drivers/crypto/al (i.e. specific Alpine driver for crypto) and cryptodev http://cryptodev-linux.org/, not OCF.

Voxel.

sfx2000 · Feb 13, 2018

Voxel said:
-O3 -pipe -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mtune=cortex-a15

I've run into some math issues with -03 - just saying... and if you're defining -mcpu, you don't have to include -mtune - as that is implied with -mcpu

curious one would use -pipe - doesn't actually change the code there, just speeds up builds...

-O2 -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard

Works for A8/A9/A7/A17 chips that I support (not all A9's are supported, due to odd stuff there with licensees - RT-68U for example, has no VFP or NEON...

A53/A57 - we run in ARMv7-a profiles, like above, as we did some benchmarking, and most cases, there wasn't enough of a change to merit a total rebuild of userland... and the baggage of support two different userlands across archs... not worth the effort.

I suspect it'll be fine with A15 as well as it's similar in many ways to A17.

sfx2000 · Feb 13, 2018

zyxmon said:
I have made similar tests on Realtek RTD1295.

What platform is this on - and what's the clock speeds? Some of the Realtek 1295's can clock way up to around 2GHz or so...

The RTD1295 is generally focused on Android STB's - and mali is a pain outside of Android - but some hobby/enthusiast contributors have made progress there. Otherwise, it's a generic Quad Cortex-A53 - should scale accordingly, it's a bigger/faster little in-order core and in my experience, a53 runs best in ARMv7A mode for the most part.

The big 32 bit OOO dual cores like A9 and quads like A15/A17 likely will out perform it.

sfx2000 · Feb 13, 2018

sfx2000 said:
Works for A8/A9/A7/A17 chips that I support

Reason why I treat A17 like A7 - big.LITTLE - same goes with A15 as those cores can also be teamed up with A7 - A8 is a bit of a challenge to support but not a priority, but A9 w/VFP and NEON is...

Tuning for A15/A17 gets a little bit - but not enough to matter...

zyxmon · Feb 14, 2018

sfx2000 said:
What platform is this on - and what's the clock speeds? Some of the Realtek 1295's can clock way up to around 2GHz or so...

This was run on QNAP TS-128A - https://www.qnap.com/en/product/ts-128a/specs/hardware
Runinig some QNAP's version of linux with kernel 4.2.8, CPU clocked 1.4GHz.

Amlogic S912 (Android + kernel 3.14.29) is much faster (-O2 + aarch64)

Code:

# /opt/bin/openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 6049209 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1560507 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 395548 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 99237 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 12415 aes-256 cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-gnu-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/media/ware4/Entware-3x.2017.12/staging_dir/target-aarch64_cortex-a53_glibc-2.25/opt/include -I/media/ware4/Entware-3x.2017.12/staging_dir/target-aarch64_cortex-a53_glibc-2.25/include -I/media/ware4/Entware-3x.2017.12/staging_dir/toolchain-aarch64_cortex-a53_gcc-6.3.0_glibc-2.25/usr/include -I/media/ware4/Entware-3x.2017.12/staging_dir/toolchain-aarch64_cortex-a53_gcc-6.3.0_glibc-2.25/include -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -O2 -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result  -fpic -I/media/ware4/Entware-3x.2017.12/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      32262.45k    33290.82k    33753.43k    33872.90k    33901.23k

For Amlogic aarch64 openssl test is ~17% faster than armv7 variant.

Voxel · Feb 14, 2018

sfx2000 said:
-O2 -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard
Works for A8/A9/A7/A17 chips that I support (not all A9's are supported, due to odd stuff there with licensees - RT-68U for example, has no VFP or NEON...

Well, you know my position

. You call it "over optimization". I do not use -march=armv7-a preferring -mcpu. Different for different platform. Anyway I respect universal multi-platform portable solution you use. I had to use similar approach in my past work ("over optimization" vs portability).

Voxel.

Thread starter	Title	Forum	Replies	Date
	Entware Entware not found	Asuswrt-Merlin	13	Apr 26, 2025
J	StrongSwan IPSec (Entware) - Missing kernel crypto modules	Asuswrt-Merlin	1	Apr 19, 2025
E	Entware not found error	Asuswrt-Merlin	1	Feb 25, 2025
C	Gnuton - Merlin filemanager without entware ?	Asuswrt-Merlin	3	Dec 23, 2024
A	Malfunctioning AX86U Pro? WiFi Radio Fail, Trouble Ejecting/Mounting Entware USB	Asuswrt-Merlin	4	Nov 22, 2024
M	AC68U - RTL2832U and Entware	Asuswrt-Merlin	5	Aug 31, 2024

Entware-3x for new HND platform (GT-AC5300 and RT-AC86U) with asuswrt-merlin firmware

Part of the Furniture

Senior Member

Part of the Furniture

Part of the Furniture

Part of the Furniture

Regular Contributor

Part of the Furniture

Asuswrt-Merlin dev

Regular Contributor

Part of the Furniture

Part of the Furniture

Asuswrt-Merlin dev

Asuswrt-Merlin dev

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Part of the Furniture

Regular Contributor

Part of the Furniture

Similar threads

Similar threads

Support SNBForums w/ Amazon

Sign Up For SNBForums Daily Digest