linux-kernel - Re: [PATCH] timers: Exclude isolated cpus from timer migation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4fdc6582c828fbcd8c6ad202ed7ab560134d1fc3.camel@redhat.com>
Date: Thu, 10 Apr 2025 15:56:02 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>, Thomas Gleixner
	 <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Waiman Long <longman@...hat.com>
Subject: Re: [PATCH] timers: Exclude isolated cpus from timer migation



On Thu, 2025-04-10 at 15:27 +0200, Frederic Weisbecker wrote:
> Le Thu, Apr 10, 2025 at 03:15:49PM +0200, Thomas Gleixner a écrit :
> > On Thu, Apr 10 2025 at 15:03, Frederic Weisbecker wrote:
> > > Le Thu, Apr 10, 2025 at 12:38:25PM +0200, Gabriele Monaco a écrit
> > > :
> > > Speaking of, those are two different issues here:
> > > 
> > > * nohz_full CPUs are handled just like idle CPUs. Once the tick
> > > is stopped,
> > >   the global timers are handled by other CPUs (housekeeping).
> > > There is always
> > >   one housekeeping CPU that never goes idle.
> > >   One subtle thing though: if the nohz_full CPU fires a tick,
> > > because there
> > >   is a local timer to be handled for example, it will also
> > > possibly handle
> > >   some global timers along the way. If it happens to be a
> > > problem, it should
> > >   be easy to resolve.
> > > 
> > > * Domain isolated CPUs are treated just like other CPUs. But
> > > there is not
> > >   always a housekeeping CPU around. And no guarantee that there
> > > is always
> > >   a non-idle CPU to take care of global timers.
> > 
> > That's an insianity.
> 
> It works, but it doesn't make much sense arguably.

I wonder if we should really worry about this scenario though.

> 
> > 
> > > > Thinking about it now, since global timers /can/ start on
> > > > isolated
> > > > cores, that makes them quite different from offline ones and
> > > > probably
> > > > considering them the same is just not the right thing to do..
> > > > 
> > > > I'm going to have a deeper thought about this whole approach,
> > > > perhaps
> > > > something simpler just preventing migration in that one
> > > > direction would
> > > > suffice.
> > > 
> > > I think we can use your solution, which involves isolating the
> > > CPU from tmigr
> > > hierarchy. And also always queue global timers to non-isolated
> > > targets.
> > 
> > Why do we have to inflict extra complexity into the timer enqueue
> > path
> > instead of preventing the migration to, but not the migration from
> > isolated CPUs?
> 
> But how do we handle global timers that have been initialized and
> queued from
> isolated CPUs?

I need to sketch a bit more the solution but the rough idea is:
1. isolated CPUs don't pull remote timers
2. isolated CPUs ignore their global timers and let others pull them
  perhaps with some more logic to avoid it expiring


Wouldn't that be sufficient?

Also, I would definitely do 1. for any kind of isolation, but I'm not
sure about 2.
Strictly speaking domain isolated cores don't claim to be free of
kernel noise, even if they initiate it (but nohz_full ones do).
What would be the expectation there?

Thanks,
Gabriele