lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6oysfyRKM_eUHlj@jlelli-thinkpadt14gen4.remote.csb>
Date: Mon, 10 Feb 2025 18:09:05 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Christian Loehle <christian.loehle@....com>
Cc: Jon Hunter <jonathanh@...dia.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Thierry Reding <treding@...dia.com>,
	Waiman Long <longman@...hat.com>, Tejun Heo <tj@...nel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Michal Koutny <mkoutny@...e.com>, Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	Phil Auld <pauld@...hat.com>, Qais Yousef <qyousef@...alina.io>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"Joel Fernandes (Google)" <joel@...lfernandes.org>,
	Suleiman Souhlal <suleiman@...gle.com>,
	Aashish Sharma <shraash@...gle.com>,
	Shin Kawamura <kawasin@...gle.com>,
	Vineeth Remanan Pillai <vineeth@...byteword.org>,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier
 for hotplug

Hi Christian,

Thanks for taking a look as well.

On 07/02/25 15:55, Christian Loehle wrote:
> On 2/7/25 14:04, Jon Hunter wrote:
> > 
> > 
> > On 07/02/2025 13:38, Dietmar Eggemann wrote:
> >> On 07/02/2025 11:38, Jon Hunter wrote:
> >>>
> >>> On 06/02/2025 09:29, Juri Lelli wrote:
> >>>> On 05/02/25 16:56, Jon Hunter wrote:
> >>>>
> >>>> ...
> >>>>
> >>>>> Thanks! That did make it easier :-)
> >>>>>
> >>>>> Here is what I see ...
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Still different from what I can repro over here, so, unfortunately, I
> >>>> had to add additional debug printks. Pushed to the same branch/repo.
> >>>>
> >>>> Could I ask for another run with it? Please also share the complete
> >>>> dmesg from boot, as I would need to check debug output when CPUs are
> >>>> first onlined.
> >>
> >> So you have a system with 2 big and 4 LITTLE CPUs (Denver0 Denver1 A57_0
> >> A57_1 A57_2 A57_3) in one MC sched domain and (Denver1 and A57_0) are
> >> isol CPUs?
> > 
> > I believe that 1-2 are the denvers (even thought they are listed as 0-1 in device-tree).
> 
> Interesting, I have yet to reproduce this with equal capacities in isolcpus.
> Maybe I didn't try hard enough yet.
> 
> > 
> >> This should be easy to set up for me on my Juno-r0 [A53 A57 A57 A53 A53 A53]
> > 
> > Yes I think it is similar to this.
> > 
> > Thanks!
> > Jon
> > 
> 
> I could reproduce that on a different LLLLbb with isolcpus=3,4 (Lb) and
> the offlining order:
> echo 0 > /sys/devices/system/cpu/cpu5/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu4/online
> 
> while the following offlining order succeeds:
> echo 0 > /sys/devices/system/cpu/cpu5/online
> echo 0 > /sys/devices/system/cpu/cpu4/online
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 0 > /sys/devices/system/cpu/cpu2/online
> echo 0 > /sys/devices/system/cpu/cpu3/online
> (Both offline an isolcpus last, both have CPU0 online)
> 
> The issue only triggers with sugov DL threads (I guess that's obvious, but
> just to mention it).

It wasn't obvious to me at first :). So thanks for confirming.

> I'll investigate some more later but wanted to share for now.

So, problem actually is that I am not yet sure what we should do with
sugovs' bandwidth wrt root domain accounting. W/o isolation it's all
good, as it gets accounted for correctly on the dynamic domains sugov
tasks can run on. But with isolation and sugov affected_cpus that cross
isolation domains (e.g., one BIG one little), we can get into troubles
not knowing if sugov contribution should fall on the DEF or DYN domain.

Hummm, need to think more about it.

Thanks,
Juri


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ