lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3acad4a1a07ccbde615ea19eb13a96f37d4a3a2f.camel@redhat.com>
Date: Fri, 23 May 2025 13:15:44 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, 
 Waiman Long <longman@...hat.com>, Anna-Maria Behnsen
 <anna-maria@...utronix.de>
Subject: Re: [PATCH v5 5/6] cgroup/cpuset: Fail if isolated and nohz_full
 don't leave any housekeeping

On Tue, 2025-05-20 at 16:28 +0200, Frederic Weisbecker wrote:
> 
> Apparently you can't trigger the same with isolcpus=0-6, for some
> reason.
> 
> One last thing, nohz_full makes sure that we never offline the
> timekeeper
> (see tick_nohz_cpu_down()). The timekeeper also never shuts down its
> tick
> and therefore never go idle, from tmigr perspective, this way when a
> nohz_full
> CPU shuts down its tick, it makes sure that its global timers are
> handled by
> the timekeeper in last resort, because it's the last global migrator,
> always
> alive.
> 
> But if the timekeeper is HK_TYPE_DOMAIN, or isolated by cpuset, it
> will go out
> of the tmigr hierarchy, breaking the guarantee to have a live global
> migrator
> for nohz_full.
> 
> That one is a bit more tricky to solve. The easiest is to forbid the
> timekeeper
> from ever being made unavailable. It is also possible to migrate the
> timekeeping duty
> to another common housekeeper.
> 
> We probably need to do the latter...

I'm thinking about this again, is it really worth the extra complexity?

The tick CPU is already set as the boot CPU and if the user requests it
as nohz_full, that's not accepted.
In my understanding, this typically happens on CPU0 and this CPU is
kinda special and is advised to stay as housekeeping. As far as I
understand, when nohz_full is enabled, the tick CPU cannot change.

Said that, I'd reconsider force keeping the tick CPU in the hierarchy
no matter if we isolate it or not when nohz_full is active (e.g. what
you mentioned as the /easy/ way).
We'd not prevent domain isolation (as the user requested), but allow a
bit more noise just on that CPU for the sake of keeping things simple
while not falling into dangerous corner cases.
If that's still a problem for the user, they are probably better off
either selecting a different mask or setting nohz_full consistently
(I'm still wondering how common a scenario this is).

Am I missing something here?

Thanks,
Gabriele


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ