[nca] Use better tight loop allocation schemes (none at all) for AES decrypt/encrypt and force MbedTLS to use AES x86_64 instructions (#2750)

Uses stack instead of allocating stuff haphazardly (16 bytes and 512 bytes respectively) - removes malloc() pollution and all that nasty stuff from tight loops
Original work by Ribbit but edited by me.
Will NOT bring a massive speedup since the main bottleneck is mbedtls itself, but may bring nice oddities to STARTUP TIMES nonetheless.
AES instructions being forced wont affect CPUs without them since there is always a runtime check for them.

Signed-off-by: lizzie lizzie@eden-emu.dev
Co-authored-by: Ribbit <ribbit@placeholder.com>
Reviewed-on: https://git.eden-emu.dev/eden-emu/eden/pulls/2750
Reviewed-by: CamilleLaVey <camillelavey99@gmail.com>
Co-authored-by: lizzie <lizzie@eden-emu.dev>
Co-committed-by: lizzie <lizzie@eden-emu.dev>
This commit is contained in:
lizzie 2025-10-17 05:08:51 +02:00 committed by crueter
parent 551f244dfd
commit 440ee4916d
No known key found for this signature in database
GPG key ID: 425ACD2D4830EBC6
9 changed files with 200 additions and 130 deletions

View file

@ -0,0 +1,13 @@
diff --git a/library/aesni.h b/library/aesni.h
index 754c984c79..59e27afd3e 100644
--- a/library/aesni.h
+++ b/library/aesni.h
@@ -35,7 +35,7 @@
/* GCC-like compilers: currently, we only support intrinsics if the requisite
* target flag is enabled when building the library (e.g. `gcc -mpclmul -msse2`
* or `clang -maes -mpclmul`). */
-#if (defined(__GNUC__) || defined(__clang__)) && defined(__AES__) && defined(__PCLMUL__)
+#if defined(__GNUC__) || defined(__clang__)
#define MBEDTLS_AESNI_HAVE_INTRINSICS
#endif
/* For 32-bit, we only support intrinsics */

View file

@ -0,0 +1,22 @@
diff --git a/library/aesni.c b/library/aesni.c
index 2857068..3e104ab 100644
--- a/library/aesni.c
+++ b/library/aesni.c
@@ -31,16 +31,14 @@
#include <immintrin.h>
#endif
-#if defined(MBEDTLS_ARCH_IS_X86)
#if defined(MBEDTLS_COMPILER_IS_GCC)
#pragma GCC push_options
#pragma GCC target ("pclmul,sse2,aes")
#define MBEDTLS_POP_TARGET_PRAGMA
-#elif defined(__clang__) && (__clang_major__ >= 5)
+#elif defined(__clang__)
#pragma clang attribute push (__attribute__((target("pclmul,sse2,aes"))), apply_to=function)
#define MBEDTLS_POP_TARGET_PRAGMA
#endif
-#endif
#if !defined(MBEDTLS_AES_USE_HARDWARE_ONLY)
/*