Userspace isn't slow, some kernel interfaces are!
Blog post from Tailscale
Significant improvements were made to the userspace implementation of WireGuard, specifically wireguard-go, which Tailscale uses, resulting in enhanced performance for the Tailscale client on Linux. These enhancements, which are planned to be integrated into WireGuard, showcase a notable improvement in throughput by employing techniques such as TCP segmentation offload (TSO), generic receive offload (GRO), and the use of sendmmsg() and recvmmsg() system calls. The changes led to a 2.2x improvement in wireguard-go's performance and up to a 33% increase in Tailscale's throughput on Linux. The performance gains were achieved by reducing per-packet overhead through increased MTU settings and leveraging existing kernel interfaces, demonstrating that userspace can match or even exceed kernel-level performance when optimized with the right techniques.