linux-kernel - Re: EEVDF and NUMA balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231003215159.GJ1539@noisy.programming.kicks-ass.net>
Date:   Tue, 3 Oct 2023 23:51:59 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Julia Lawall <julia.lawall@...ia.fr>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org
Subject: Re: EEVDF and NUMA balancing

On Tue, Oct 03, 2023 at 10:25:08PM +0200, Julia Lawall wrote:
> Is it expected that the commit e8f331bcc270 should have an impact on the
> frequency of NUMA balancing?

Definitely not expected. The only effect of that commit was supposed to
be the runqueue order of tasks. I'll go stare at it in the morning --
definitely too late for critical thinking atm.

Thanks!

> The NAS benchmark ua.C.x (NPB3.4-OMP,
> https://github.com/mbdevpl/nas-parallel-benchmarks.git) on a 4-socket
> Intel Xeon 6130 suffers from some NUMA moves that leave some sockets with
> too few threads and other sockets with too many threads.  Prior to the
> commit e8f331bcc270, this was corrected by subsequent load balancing,
> leading to run times of 20-40 seconds (around 20 seconds can be achieved
> if one just turns NUMA balancing off).  After commit e8f331bcc270, the
> running time can go up to 150 seconds.  In the worst case, I have seen a
> core remain idle for 75 seconds.  It seems that the load balancer at the
> NUMA domain level is not able to do anything, because when a core on the
> overloaded socket has multiple threads, they are tasks that were NUMA
> balanced to the socket, and thus should not leave.  So the "busiest" core
> chosen by find_busiest_queue doesn't actually contain any stealable
> threads.  Maybe it could be worth stealing from a core that has only one
> task in this case, in hopes that the tasks that are tied to a socket will
> spread out better across it if more space is available?
> 
> An example run is attached.  The cores are renumbered according to the
> sockets, so there is an overload on socket 1 and an underload on sockets
> 2.
> 
> julia