1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
Dismiss Notice

Welcome To SNBForums

SNBForums is a community for anyone who wants to learn about or discuss the latest in wireless routers, network storage and the ins and outs of building and maintaining a small network.

If you'd like to post a question, simply register and have at it!

While you're at it, please check out SmallNetBuilder for product reviews and our famous Router Charts, Ranker and plenty more!

Entware-3x for new HND platform (GT-AC5300 and RT-AC86U) with asuswrt-merlin firmware

Discussion in 'Asuswrt-Merlin' started by zyxmon, Feb 2, 2018.

  1. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    As I can see Entware versions are compiled with a goal to support /dev/crypto i.e. not only using assember acceleration but also hardware acceleration. I.e. these options for compilation:
    Code:
    -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
    
    So if it is really so, the test should be run with "-evp" and "-elapsed" options to use /dev/crypto:
    Code:
    openssl speed -evp aes-256-cbc -elapsed
    
    Merlin's version is using assembler optimization. These options -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS slow down the speed of OpenSSL if no /dev/crypto is used.

    Voxel.
     
  2. Please support SNBForums! Just click on this link before you buy something from Amazon and we'll get a small commission on anything you buy. Thanks!
  3. Fitz Mutch

    Fitz Mutch Senior Member

    Joined:
    May 27, 2016
    Messages:
    454
    Location:
    Portsmouth
    My RT-AC86U cpuinfo has "fp asimd evtstrm aes pmull sha1 sha2 crc32". I want to turn on all the bells and whistles. Have you tried "-mfpu=crypto-neon-fp-armv8" ?
     
  4. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    IMO you had in mind my version: Voxel. ?

    Voxel.
     
  5. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    No, sorry. I prepare version for Cortex-A15 (32-bit) for users of NETGEAR R7500/R7800/R9000. I do not have armv8 gadgets.

    Voxel.
     
  6. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    And BTW if it supports /dev/crypto it is most promising bell to ring :).

    Voxel.
     
  7. zyxmon

    zyxmon Regular Contributor

    Joined:
    Feb 9, 2015
    Messages:
    166
    Yes, sure.
     
  8. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    BTW, is there /dev/crypto in Asus RT-58AC and/or RT-86U really? I do not have neither nor to check...

    Voxel.
     
  9. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    27,393
    Location:
    Canada
    When I enabled Broadcom's crypto engine on the RT-AC86U, it seriously reduced OpenVPN performance due to the added context switches (I assume). That's why I keep it disabled.
     
  10. zyxmon

    zyxmon Regular Contributor

    Joined:
    Feb 9, 2015
    Messages:
    166
    No, there is no /dev/crypto (openwrt). Some of my NASes have /dev/encryptfs - hardware encrypted file system support?
     
  11. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    It is interesting. I have rather opposite feedback from users of R9000 (AL-514, Cortex-A15). For example:

    https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/#post-335864
    (it is just assembler acceleration)

    and

    https://www.snbforums.com/threads/custom-firmware-build-for-r9000.40125/page-4#post-338655
    (it is when guy tried version with asm plus /dev/crypto)

    So 61/19 vs 93/21 for AL-514 (OpenVPN).

    Voxel.
     
  12. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    If so it has no sense to compile OpenSSL with -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS. IMO. As I know these options really slow down the speed. I would suggest to use pure asm acceleration.

    /dev/encryptfs != /dev/crypto of course.

    Voxel.
     
  13. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    27,393
    Location:
    Canada
    Could be because I'm already optimizing OpenSSL and OpenVPN beyond what Netgear does. I ran iperf tests through an OpenVPN tunnel, and performance dropped. Raw openssl speed tests was also slower on small block sizes - see the benchmarks I posted in the VPN sub-forums.

    I suspect IPSEC would be where performance improvements could be gained, but I haven't had time to debug the Strongswan implementation on the RT-AC86U to run tests.
     
  14. RMerlin

    RMerlin Super Moderator

    Joined:
    Apr 14, 2012
    Messages:
    27,393
    Location:
    Canada
    What I did in my tests is to compile with OpenSSL external engine support, then used such an engine to access the kernel cryptodev API.

    I don't remember the exact build time change I did tho, I had to manually compile specific pieces and copy them to a running router, as the change would prevent some of the other firmware components from running properly.
     
    Voxel likes this.
  15. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    Sorry, I do not talk re: what Netgear does. This guy who tested OpenVPN (my links above) tried my version with optimized OpenVPN and OpenSSL. Netgear still do not enable any acceleration of OpenSSL in their stock firmware. Even assembler acceleration in spite of my hints passed to NG developers by NETGEAR Guy.

    Voxel.
     
  16. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    I've used kernel's code in /drivers/crypto/al (i.e. specific Alpine driver for crypto) and cryptodev http://cryptodev-linux.org/, not OCF.

    Voxel.
     
  17. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    12,868
    Location:
    San Diego, CA
    I've run into some math issues with -03 - just saying... and if you're defining -mcpu, you don't have to include -mtune - as that is implied with -mcpu

    curious one would use -pipe - doesn't actually change the code there, just speeds up builds...

    -O2 -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard

    Works for A8/A9/A7/A17 chips that I support (not all A9's are supported, due to odd stuff there with licensees - RT-68U for example, has no VFP or NEON...

    A53/A57 - we run in ARMv7-a profiles, like above, as we did some benchmarking, and most cases, there wasn't enough of a change to merit a total rebuild of userland... and the baggage of support two different userlands across archs... not worth the effort.

    I suspect it'll be fine with A15 as well as it's similar in many ways to A17.
     
  18. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    12,868
    Location:
    San Diego, CA
    What platform is this on - and what's the clock speeds? Some of the Realtek 1295's can clock way up to around 2GHz or so...

    The RTD1295 is generally focused on Android STB's - and mali is a pain outside of Android - but some hobby/enthusiast contributors have made progress there. Otherwise, it's a generic Quad Cortex-A53 - should scale accordingly, it's a bigger/faster little in-order core and in my experience, a53 runs best in ARMv7A mode for the most part.

    The big 32 bit OOO dual cores like A9 and quads like A15/A17 likely will out perform it.
     
  19. sfx2000

    sfx2000 Part of the Furniture

    Joined:
    Aug 11, 2011
    Messages:
    12,868
    Location:
    San Diego, CA
    Reason why I treat A17 like A7 - big.LITTLE - same goes with A15 as those cores can also be teamed up with A7 - A8 is a bit of a challenge to support but not a priority, but A9 w/VFP and NEON is...

    Tuning for A15/A17 gets a little bit - but not enough to matter...
     
  20. zyxmon

    zyxmon Regular Contributor

    Joined:
    Feb 9, 2015
    Messages:
    166
    This was run on QNAP TS-128A - https://www.qnap.com/en/product/ts-128a/specs/hardware
    Runinig some QNAP's version of linux with kernel 4.2.8, CPU clocked 1.4GHz.

    Amlogic S912 (Android + kernel 3.14.29) is much faster (-O2 + aarch64)
    Code:
    # /opt/bin/openssl speed aes-256-cbc
    Doing aes-256 cbc for 3s on 16 size blocks: 6049209 aes-256 cbc's in 3.00s
    Doing aes-256 cbc for 3s on 64 size blocks: 1560507 aes-256 cbc's in 3.00s
    Doing aes-256 cbc for 3s on 256 size blocks: 395548 aes-256 cbc's in 3.00s
    Doing aes-256 cbc for 3s on 1024 size blocks: 99237 aes-256 cbc's in 3.00s
    Doing aes-256 cbc for 3s on 8192 size blocks: 12415 aes-256 cbc's in 3.00s
    OpenSSL 1.0.2n  7 Dec 2017
    built on: reproducible build, date unspecified
    options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr)
    compiler: aarch64-openwrt-linux-gnu-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/media/ware4/Entware-3x.2017.12/staging_dir/target-aarch64_cortex-a53_glibc-2.25/opt/include -I/media/ware4/Entware-3x.2017.12/staging_dir/target-aarch64_cortex-a53_glibc-2.25/include -I/media/ware4/Entware-3x.2017.12/staging_dir/toolchain-aarch64_cortex-a53_gcc-6.3.0_glibc-2.25/usr/include -I/media/ware4/Entware-3x.2017.12/staging_dir/toolchain-aarch64_cortex-a53_gcc-6.3.0_glibc-2.25/include -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -O2 -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result  -fpic -I/media/ware4/Entware-3x.2017.12/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-256 cbc      32262.45k    33290.82k    33753.43k    33872.90k    33901.23k
    
    For Amlogic aarch64 openssl test is ~17% faster than armv7 variant.
     
  21. Voxel

    Voxel Very Senior Member

    Joined:
    Dec 9, 2014
    Messages:
    794
    Well, you know my position :). You call it "over optimization". I do not use -march=armv7-a preferring -mcpu. Different for different platform. Anyway I respect universal multi-platform portable solution you use. I had to use similar approach in my past work ("over optimization" vs portability).

    Voxel.
     
Please support SNBForums! Just click on this link before you buy something from Amazon and we'll get a small commission on anything you buy. Thanks!