SSE 4.1/AVX/AVX2, if you have it will all accelerate ChaCha20 with our implementation.
With AVX-512 or even some recent Atoms, you can run VAES for AES-GCM and that rips .vs plain AES-NI.
Only problem with your assertion: In addition to AES-GCM, DCO can run ChaCha20/Poly1305, and yes, OpenVPN w/DCO is still faster than Wireguard running ChaCha20/Poly1305. This is what I was relating when I wrote: "(even keeping the transform the same.)"
There is no good reason for this, after...