lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHFTikYFRoJu2mRhNFv6GHPP4LNEDetMdsqkzAg1nTJfRA@mail.gmail.com>
Date: Fri, 28 Nov 2025 11:11:46 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, oe-lkp@...ts.linux.dev, lkp@...el.com, 
	linux-kernel@...r.kernel.org, Borislav Petkov <bp@...en8.de>, 
	Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [linus:master] [x86] 284922f4c5: stress-ng.sockfd.ops_per_sec
 6.1% improvement

On Fri, Nov 28, 2025 at 7:30 AM kernel test robot <oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a 6.1% improvement of stress-ng.sockfd.ops_per_sec on:
>
>
> commit: 284922f4c563aa3a8558a00f2a05722133237fe8 ("x86: uaccess: don't use runtime-const rewriting in modules")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
>         nr_threads: 100%
>         testtime: 60s
>         test: sockfd
>         cpufreq_governor: performance
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251128/202511281306.51105b46-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-spr-r02/sockfd/stress-ng/60s
>
> commit:
>   17d85f33a8 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma")
>   284922f4c5 ("x86: uaccess: don't use runtime-const rewriting in modules")
>
> 17d85f33a83b84e7 284922f4c563aa3a8558a00f2a0
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   55674763            +6.1%   59075135        stress-ng.sockfd.ops
>     927326            +6.1%     983845        stress-ng.sockfd.ops_per_sec
>       3555 ±  3%     +10.6%       3932 ±  3%  perf-c2c.DRAM.remote
>       4834 ±  3%     +12.0%       5415 ±  3%  perf-c2c.HITM.local
>       2714 ±  2%     +12.5%       3054 ±  3%  perf-c2c.HITM.remote
>       0.51            +3.9%       0.53        perf-stat.i.MPKI
>   34903541            +5.2%   36715161        perf-stat.i.cache-misses
>  1.072e+08            +5.8%  1.133e+08        perf-stat.i.cache-references
>      18971            -5.5%      17932        perf-stat.i.cycles-between-cache-misses
>       0.46 ± 30%     +13.6%       0.52        perf-stat.overall.MPKI
>   31330827 ± 30%     +14.9%   36004895        perf-stat.ps.cache-misses
>   96530576 ± 30%     +15.3%  1.113e+08        perf-stat.ps.cache-references
>      48.32            -0.2       48.16        perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_del_edges.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
>      48.23            -0.2       48.07        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.unix_del_edges.unix_stream_read_generic.unix_stream_recvmsg
>      48.34            -0.2       48.18        perf-profile.calltrace.cycles-pp.unix_del_edges.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.____sys_recvmsg
>       0.56 ±  4%      +0.1        0.65 ±  9%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
>       0.62 ±  3%      +0.1        0.71 ±  8%  perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.stress_sockfd
>       0.56 ±  3%      +0.1        0.65 ±  8%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      48.34            -0.2       48.18        perf-profile.children.cycles-pp.unix_del_edges
>       0.15 ±  3%      +0.0        0.17 ±  2%  perf-profile.children.cycles-pp.__scm_recv_common
>       0.08 ±  7%      +0.0        0.10 ±  7%  perf-profile.children.cycles-pp.lockref_put_return
>       0.09 ±  5%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.__fput
>       0.35 ±  5%      +0.1        0.43 ± 12%  perf-profile.children.cycles-pp.do_open
>       0.63 ±  3%      +0.1        0.72 ±  8%  perf-profile.children.cycles-pp.do_sys_openat2
>       0.56 ±  3%      +0.1        0.65 ±  8%  perf-profile.children.cycles-pp.do_filp_open
>

While this may read suspicious as the change is supposed to be a nop
for core kernel, it in fact is not as it adds:
/* Used for modules: built-in code uses runtime constants */
+unsigned long USER_PTR_MAX;
+EXPORT_SYMBOL(USER_PTR_MAX);

this should probably be __ro_after_init.

The test at hand is heavily bottlenecked on the global lock in the
garbage collector, which is not annotated with anything.

On my kernel I see this (nm vmlinux | sort -nk 1):
ffffffff846c0a20 b bsd_socket_locks
ffffffff846c0e20 b bsd_socket_buckets
ffffffff846c1620 b unix_nr_socks
ffffffff846c1628 b gc_in_progress
ffffffff846c1630 b unix_graph_cyclic_sccs
ffffffff846c1638 b unix_gc_lock <--- THE LOCK
ffffffff846c1640 b unix_vertex_unvisited_index
ffffffff846c1648 b unix_graph_state
ffffffff846c1660 b unix_stream_bpf_prot
ffffffff846c1820 b unix_stream_prot_lock
ffffffff846c1840 b unix_dgram_bpf_prot
ffffffff846c1a00 b unix_dgram_prot_lock

note how bsd_socket_buckets looks suspicious in its own right, but
ignoring that bit, I'm guessing things got pushed around and it
changed some of cacheline bouncing.

while a full fix is beyond the scope of this patch(tm), perhaps the
annotation below will stabilize it against random breakage. can you
guys bench it.

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 78323d43e63e..25f65817faab 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -199,7 +199,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl)
        }
 }

-static DEFINE_SPINLOCK(unix_gc_lock);
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(unix_gc_lock);

 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
 {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ