lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAL2CeByu46f48j8O9=237fvhpPUCsKjOZkFuuHU3xrr3wVF5Eg@mail.gmail.com>
Date: Fri, 21 Nov 2025 00:04:55 -0500
From: Luke Yang <luyang@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>
Cc: Jirka Hladky <jhladky@...hat.com>, linux-arm-kernel@...ts.infradead.org, 
	linux-kernel@...r.kernel.org, Joe Mario <jmario@...hat.com>
Subject: [PATCH] arm64: clear_user: align __arch_clear_user() to 128B for
 I-cache efficiency

On  aarch64 kernels, recent changes (specifically irqbypass patch
https://lore.kernel.org/all/20250516230734.2564775-6-seanjc@google.com/)
shifted __arch_clear_user() such that the tight zeroing loop straddles
I-cache lines. This causes measurable read performance regression when
reading from /dev/zero.

Add `.p2align 6` (64-byte alignment) to guarantee the loop stays within a
single I-cache boundary, restoring the previous IPC and throughput.

Tested on bare-metal aarch64 systems:

  Good kernel: pread_z100k ~ 6.9 s
  Bad kernel:  pread_z100k ~ 9.0 s
  With patch:  pread_z100k ~ 6.9 s

Reproducer:

// gcc -O2 -Wall -Wextra -o pread_z100k pread_z100k.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (100 * 1024)
#define COUNT 1000000

static double now_sec(void)
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec + tv.tv_usec / 1e6;
}

int main(void)
{
    int fd = open("/dev/zero", O_RDONLY);
    if (fd < 0) { perror("open /dev/zero"); return 1; }

    char *buf = malloc(SIZE);
    if (!buf) { perror("malloc"); close(fd); return 1; }

    double t1 = now_sec();
    for (int i = 0; i < COUNT; i++) {
        ssize_t r = pread(fd, buf, SIZE, 0);
        if (r != SIZE) { perror("pread"); break; }
    }
    double t2 = now_sec();

    printf("%.6f\n", t2 - t1);

    close(fd);
    free(buf);
    return 0;
}

Signed-off-by: Luke Yang <luyang@...hat.com>
Signed-off-by: Jirka Hladky <jhladky@...hat.com>
---
 arch/arm64/lib/clear_user.S | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index de9a303b6..91eee4a7c 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -17,6 +17,11 @@
  * Alignment fixed up by hardware.
  */

+/*
+ * Ensure __arch_clear_user() always starts on a clean I-cache boundary.
+ */
+    .p2align 6    // 2^6 = 64-byte alignment
+
 SYM_FUNC_START(__arch_clear_user)
     add    x2, x0, x1

-- 
2.51.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ