TWx Linux Repository
Go to file
Eric Biggers 8c4fc9ce40 crypto: x86/aes-ctr - rewrite AESNI+AVX optimized CTR and add VAES support
Delete aes_ctrby8_avx-x86_64.S and add a new assembly file
aes-ctr-avx-x86_64.S which follows a similar approach to
aes-xts-avx-x86_64.S in that it uses a "template" to provide AESNI+AVX,
VAES+AVX2, VAES+AVX10/256, and VAES+AVX10/512 code, instead of just
AESNI+AVX.  Wire it up to the crypto API accordingly.

This greatly improves the performance of AES-CTR and AES-XCTR on
VAES-capable CPUs, with the best case being AMD Zen 5 where an over 230%
increase in throughput is seen on long messages.  Performance on
non-VAES-capable CPUs remains about the same, and the non-AVX AES-CTR
code (aesni_ctr_enc) is also kept as-is for now.  There are some slight
regressions (less than 10%) on some short message lengths on some CPUs;
these are difficult to avoid, given how the previous code was so heavily
unrolled by message length, and they are not particularly important.
Detailed performance results are given in the tables below.

Both CTR and XCTR support is retained.  The main loop remains
8-vector-wide, which differs from the 4-vector-wide main loops that are
used in the XTS and GCM code.  A wider loop is appropriate for CTR and
XCTR since they have fewer other instructions (such as vpclmulqdq) to
interleave with the AES instructions.

Similar to what was the case for AES-GCM, the new assembly code also has
a much smaller binary size, as it fixes the excessive unrolling by data
length and key length present in the old code.  Specifically, the new
assembly file compiles to about 9 KB of text vs. 28 KB for the old file.
This is despite 4x as many implementations being included.

The tables below show the detailed performance results.  The tables show
percentage improvement in single-threaded throughput for repeated
encryption of the given message length; an increase from 6000 MB/s to
12000 MB/s would be listed as 100%.  They were collected by directly
measuring the Linux crypto API performance using a custom kernel module.
The tested CPUs were all server processors from Google Compute Engine
except for Zen 5 which was a Ryzen 9 9950X desktop processor.

Table 1: AES-256-CTR throughput improvement,
         CPU microarchitecture vs. message length in bytes:

                     | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
---------------------+-------+-------+-------+-------+-------+-------+
AMD Zen 5            |  232% |  203% |  212% |  143% |   71% |   95% |
Intel Emerald Rapids |  116% |  116% |  117% |   91% |   78% |   79% |
Intel Ice Lake       |  109% |  103% |  107% |   81% |   54% |   56% |
AMD Zen 4            |  109% |   91% |  100% |   70% |   43% |   59% |
AMD Zen 3            |   92% |   78% |   87% |   57% |   32% |   43% |
AMD Zen 2            |    9% |    8% |   14% |   12% |    8% |   21% |
Intel Skylake        |    7% |    7% |    8% |    5% |    3% |    8% |

                     |   300 |   200 |    64 |    63 |    16 |
---------------------+-------+-------+-------+-------+-------+
AMD Zen 5            |   57% |   39% |   -9% |    7% |   -7% |
Intel Emerald Rapids |   37% |   42% |   -0% |   13% |   -8% |
Intel Ice Lake       |   39% |   30% |   -1% |   14% |   -9% |
AMD Zen 4            |   42% |   38% |   -0% |   18% |   -3% |
AMD Zen 3            |   38% |   35% |    6% |   31% |    5% |
AMD Zen 2            |   24% |   23% |    5% |   30% |    3% |
Intel Skylake        |    9% |    1% |   -4% |   10% |   -7% |

Table 2: AES-256-XCTR throughput improvement,
         CPU microarchitecture vs. message length in bytes:

                     | 16384 |  4096 |  4095 |  1420 |   512 |   500 |
---------------------+-------+-------+-------+-------+-------+-------+
AMD Zen 5            |  240% |  201% |  216% |  151% |   75% |  108% |
Intel Emerald Rapids |  100% |   99% |  102% |   91% |   94% |  104% |
Intel Ice Lake       |   93% |   89% |   92% |   74% |   50% |   64% |
AMD Zen 4            |   86% |   75% |   83% |   60% |   41% |   52% |
AMD Zen 3            |   73% |   63% |   69% |   45% |   21% |   33% |
AMD Zen 2            |   -2% |   -2% |    2% |    3% |   -1% |   11% |
Intel Skylake        |   -1% |   -1% |    1% |    2% |   -1% |    9% |

                     |   300 |   200 |    64 |    63 |    16 |
---------------------+-------+-------+-------+-------+-------+
AMD Zen 5            |   78% |   56% |   -4% |   38% |   -2% |
Intel Emerald Rapids |   61% |   55% |    4% |   32% |   -5% |
Intel Ice Lake       |   57% |   42% |    3% |   44% |   -4% |
AMD Zen 4            |   35% |   28% |   -1% |   17% |   -3% |
AMD Zen 3            |   26% |   23% |   -3% |   11% |   -6% |
AMD Zen 2            |   13% |   24% |   -1% |   14% |   -3% |
Intel Skylake        |   16% |    8% |   -4% |   35% |   -3% |

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-02-22 15:56:03 +08:00
arch crypto: x86/aes-ctr - rewrite AESNI+AVX optimized CTR and add VAES support 2025-02-22 15:56:03 +08:00
block block-6.14-20250131 2025-01-31 11:49:30 -08:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto crypto: ahash - use str_yes_no() helper in crypto_ahash_show() 2025-02-22 15:56:03 +08:00
Documentation dt-bindings: rng: add binding for Rockchip RK3588 RNG 2025-02-22 15:56:02 +08:00
drivers crypto: inside-secure - Eliminate duplication in top-level Makefile 2025-02-22 15:56:02 +08:00
fs assorted stuff for this merge window 2025-02-01 15:07:56 -08:00
include dt-bindings: reset: Add SCMI reset IDs for RK3588 2025-02-22 15:56:02 +08:00
init Kbuild updates for v6.14 2025-01-31 12:07:07 -08:00
io_uring io_uring-6.14-20250131 2025-01-31 11:29:23 -08:00
ipc treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
kernel 21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 2025-02-01 09:49:20 -08:00
lib lib: 842: Improve error handling in sw842_compress() 2025-02-09 18:08:11 +08:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm assorted stuff for this merge window 2025-02-01 15:07:56 -08:00
net assorted stuff for this merge window 2025-02-01 15:07:56 -08:00
rust Kbuild updates for v6.14 2025-01-31 12:07:07 -08:00
samples AT_EXECVE_CHECK update for v6.14-rc1 (fix1) 2025-01-31 17:12:31 -08:00
scripts 21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 2025-02-01 09:49:20 -08:00
security treewide: const qualify ctl_tables where applicable 2025-01-28 13:48:37 +01:00
sound sound fixes for 6.14-rc1 2025-01-31 09:17:02 -08:00
tools Turbostat 2025.02.02 updates since 2024.11.30 2025-02-02 10:49:13 -08:00
usr kbuild: Drop support for include/asm-<arch> in headers_check.pl 2024-12-21 11:43:17 +09:00
virt Merge branch 'kvm-mirror-page-tables' into HEAD 2025-01-20 07:15:58 -05:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.clippy.toml rust: give Clippy the minimum supported Rust version 2025-01-10 00:17:25 +01:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore rust: use host dylib naming convention to support macOS 2025-01-10 01:01:24 +01:00
.mailmap 21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 2025-02-01 09:49:20 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS Mainly individually changelogged singleton patches. The patch series in 2025-01-26 17:50:53 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS MAINTAINERS: add Nicolas Frattaroli to rockchip-rng maintainers 2025-02-22 15:56:02 +08:00
Makefile Linux 6.14-rc1 2025-02-02 15:39:26 -08:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.