lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc016aaa-fed0-4974-8f9d-5bf671920dc7@oracle.com>
Date: Wed, 30 Apr 2025 03:41:58 -0700
From: Libo Chen <libo.chen@...cle.com>
To: K Prateek Nayak <kprateek.nayak@....com>,
        Jean-Baptiste Roquefere <jb.roquefere@...me.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "mingo@...nel.org"
 <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc: Borislav Petkov <bp@...en8.de>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Swapnil Sapkal <swapnil.sapkal@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        "regressions@...ts.linux.dev" <regressions@...ts.linux.dev>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        Konrad Wilk <konrad.wilk@...cle.com>
Subject: Re: IPC drop down on AMD epyc 7702P



On 4/30/25 02:13, K Prateek Nayak wrote:
> (+ more scheduler folks)
> 
> tl;dr
> 
> JB has a workload that hates aggressive migration on the 2nd Generation
> EPYC platform that has a small LLC domain (4C/8T) and very noticeable
> C2C latency.
> 
> Based on JB's observation so far, reverting commit 16b0a7a1a0af
> ("sched/fair: Ensure tasks spreading in LLC during LB") and commit
> c5b0a7eefc70 ("sched/fair: Remove sysctl_sched_migration_cost
> condition") helps the workload. Both those commits allow aggressive
> migrations for work conservation except it also increased cache
> misses which slows the workload quite a bit.
> 
> "relax_domain_level" helps but cannot be set at runtime and I couldn't
> think of any stable / debug interfaces that JB hasn't tried out
> already that can help this workload.
> 
> There is a patch towards the end to set "relax_domain_level" at
> runtime but given cpusets got away with this when transitioning to
> cgroup-v2, I don't know what the sentiments are around its usage.
> Any input / feedback is greatly appreciated.
> 


Hi Prateek,

Oh no, not "relax_domain_level" again, this can lead to load imbalance
in variety of ways. We were so glad this one went away with cgroupv2,
it tends to be abused by users as an "easy" fix for some urgent perf 
issues instead of addressing their root causes.


Thanks,
Libo




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ