lib/crypto: x86/poly1305: Fix performance regression on short messages
Restore the len >= 288 condition on using the AVX implementation, which was incidentally removed by commit318c53ae02("crypto: x86/poly1305 - Add block-only interface"). This check took into account the overhead in key power computation, kernel-mode "FPU", and tail handling associated with the AVX code. Indeed, restoring this check slightly improves performance for len < 256 as measured using poly1305_kunit on an "AMD Ryzen AI 9 365" (Zen 5) CPU: Length Before After ====== ========== ========== 1 30 MB/s 36 MB/s 16 516 MB/s 598 MB/s 64 1700 MB/s 1882 MB/s 127 2265 MB/s 2651 MB/s 128 2457 MB/s 2827 MB/s 200 2702 MB/s 3238 MB/s 256 3841 MB/s 3768 MB/s 511 4580 MB/s 4585 MB/s 512 5430 MB/s 5398 MB/s 1024 7268 MB/s 7305 MB/s 3173 8999 MB/s 8948 MB/s 4096 9942 MB/s 9921 MB/s 16384 10557 MB/s 10545 MB/s While the optimal threshold for this CPU might be slightly lower than 288 (see the len == 256 case), other CPUs would need to be tested too, and these sorts of benchmarks can underestimate the true cost of kernel-mode "FPU". Therefore, for now just restore the 288 threshold. Fixes:318c53ae02("crypto: x86/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250706231100.176113-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
This commit is contained in:
@@ -98,7 +98,15 @@ void poly1305_blocks_arch(struct poly1305_block_state *state, const u8 *inp,
|
||||
BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE ||
|
||||
SZ_4K % POLY1305_BLOCK_SIZE);
|
||||
|
||||
/*
|
||||
* The AVX implementations have significant setup overhead (e.g. key
|
||||
* power computation, kernel FPU enabling) which makes them slower for
|
||||
* short messages. Fall back to the scalar implementation for messages
|
||||
* shorter than 288 bytes, unless the AVX-specific key setup has already
|
||||
* been performed (indicated by ctx->is_base2_26).
|
||||
*/
|
||||
if (!static_branch_likely(&poly1305_use_avx) ||
|
||||
(len < POLY1305_BLOCK_SIZE * 18 && !ctx->is_base2_26) ||
|
||||
unlikely(!irq_fpu_usable())) {
|
||||
convert_to_base2_64(ctx);
|
||||
poly1305_blocks_x86_64(ctx, inp, len, padbit);
|
||||
|
||||
Reference in New Issue
Block a user