Did some interesting discovery lately while investigating OpenSSL 3.0 performance on a BCM4916. Ultimately, the performance difference turned out to be simply due to a change in how "openssl speed" ran its test since 3.0.16, just replacing the speed testing code with the 3.0.15 version brought the performance back on par with 1.1.1. But I found along the way that openssl's performance boost seem to come from leveraging the NEON extensions, not the AES extensions. AES extensions would require the code to be compiled as 64-bit to be accessible. That means that in theory it's potentially leveraged by the kernel (which is 64-bit on this platform, while userspace like openssl is 32-bit).
This comes from openssl 3.0 which can report the CPU capabilities that it detects. On BCM4916, the cap value reported is 0x3d, which indicates:
bit 0: Neon
bit 2: PMULL (used by GCM)
bit 3: SHA1
bit 4: SHA256
bit 5: CPUID
bit 1 would be set if it were using AES extensions. The extensions are present in the cpuinfo output, they just aren't usable in 32-bit mode (ARMv7), requiring aarch64.
Now, I wonder what would happen if we compiled a static version of OpenVPN in 64-bit mode, static linked with the necessary libraries. Would the bloat be compensated by any real performance gain? Hm...
This comes from openssl 3.0 which can report the CPU capabilities that it detects. On BCM4916, the cap value reported is 0x3d, which indicates:
bit 0: Neon
bit 2: PMULL (used by GCM)
bit 3: SHA1
bit 4: SHA256
bit 5: CPUID
bit 1 would be set if it were using AES extensions. The extensions are present in the cpuinfo output, they just aren't usable in 32-bit mode (ARMv7), requiring aarch64.
Now, I wonder what would happen if we compiled a static version of OpenVPN in 64-bit mode, static linked with the necessary libraries. Would the bloat be compensated by any real performance gain? Hm...
