lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9512432189461cad59402c78c386ec3132b1e668.camel@redhat.com>
Date: Wed, 16 Apr 2025 08:23:41 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Waiman Long <llong@...hat.com>, linux-kernel@...r.kernel.org, Frederic
 Weisbecker <frederic@...nel.org>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 3/3] timers: Exclude isolated cpus from timer migation



On Tue, 2025-04-15 at 22:24 -0400, Waiman Long wrote:
> On 4/15/25 11:49 AM, Gabriele Monaco wrote:
> > 
> > On Tue, 2025-04-15 at 11:30 -0400, Waiman Long wrote:
> > > On 4/15/25 6:25 AM, Gabriele Monaco wrote:
> > > > The timer migration mechanism allows active CPUs to pull timers
> > > > from
> > > > idle ones to improve the overall idle time. This is however
> > > > undesired
> > > > when CPU intensive workloads run on isolated cores, as the
> > > > algorithm
> > > > would move the timers from housekeeping to isolated cores,
> > > > negatively
> > > > affecting the isolation.
> > > > 
> > > > This effect was noticed on a 128 cores machine running oslat on
> > > > the
> > > > isolated cores (1-31,33-63,65-95,97-127). The tool monopolises
> > > > CPUs,
> > > > and the CPU with lowest count in a timer migration hierarchy
> > > > (here
> > > > 1
> > > > and 65) appears as always active and continuously pulls global
> > > > timers,
> > > > from the housekeeping CPUs. This ends up moving driver work
> > > > (e.g.
> > > > delayed work) to isolated CPUs and causes latency spikes:
> > > > 
> > > > before the change:
> > > > 
> > > >    # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > >    ...
> > > >     Maximum:     1203 10 3 4 ... 5 (us)
> > > > 
> > > > after the change:
> > > > 
> > > >    # oslat -c 1-31,33-63,65-95,97-127 -D 62s
> > > >    ...
> > > >     Maximum:      10 4 3 4 3 ... 5 (us)
> > > > 
> > > > Exclude isolated cores from the timer migration algorithm,
> > > > extend
> > > > the
> > > > concept of unavailable cores, currently used for offline ones,
> > > > to
> > > > isolated ones:
> > > > * A core is unavailable if isolated or offline;
> > > > * A core is available if isolated and offline;
> > > I think you mean "A core is available if NOT isolated and NOT
> > > offline".
> > > Right?
> > Yes, of course.. My bad. Thanks for spotting.
> > 
> > > > A core is considered unavailable as idle if:
> What do you mean by "unavailable as idle"? An idle CPU is different
> from 
> an unvailable CPU, I think.

Here I mean unavailable for tmigr, see below for why I got that term.
If you find it misleading I could look for a better term to represent
the concept.

> > > > * is in the isolcpus list
> > > > * is in the nohz_full list
> > > > * is in an isolated cpuset
> > > > 
> > > > Due to how the timer migration algorithm works, any CPU part of
> > > > the
> > > > hierarchy can have their global timers pulled by remote CPUs
> > > > and
> > > > have to
> > > > pull remote timers, only skipping pulling remote timers would
> > > > break
> > > > the
> > > > logic.
> > > > For this reason, we prevents isolated CPUs from pulling remote
> > > > global
> > > > timers, but also the other way around: any global timer started
> > > > on
> > > > an
> > > > isolated CPU will run there. This does not break the concept of
> > > > isolation (global timers don't come from outside the CPU) and,
> > > > if
> > > > considered inappropriate, can usually be mitigated with other
> > > > isolation
> > > > techniques (e.g. IRQ pinning).
> 
> BTW, I am not that familiar with the timer migration code. Does
> marking 
> an isolated CPU as unavailable (previously offline) make the above 
> behavior happen?
> 
> Now unavailable CPUs include the isolated CPUs. We may need to look
> at 
> some of the online (now available) flag check within the timer
> migration 
> code to make sure that they are still doing the right thing.
> 

The original tmigr code excludes offline CPUs from the hierarchy, those
clearly won't have timers and everything works.
This series changes that concept to unavailable (I took the name from
tmigr_is_not_available, which used to check if a CPU was online and
initialised for tmigr) to also include isolated CPUs.

A CPU is then unavailable because it's offline, isolated or both, this
effectively prevents it from joining the hierarchy.
One noticeable difference is that isolated CPUs can still have global
timers, the fact they don't participate in the hierarchy would isolate
also those (which won't migrate). This is what I kind of missed in v1
but acknowledging that, everything seems to work.

That's as far as I could see, of course, reviewers might find some
corner cases I didn't consider.

> 
> > > > diff --git a/kernel/time/timer_migration.c
> > > > b/kernel/time/timer_migration.c
> > > > index 1fae38fbac8c2..6fe6ca798e98d 100644
> > > > --- a/kernel/time/timer_migration.c
> > > > +++ b/kernel/time/timer_migration.c
> > > > @@ -10,6 +10,7 @@
> > > >    #include <linux/spinlock.h>
> > > >    #include <linux/timerqueue.h>
> > > >    #include <trace/events/ipi.h>
> > > > +#include <linux/sched/isolation.h>
> > > >       #include "timer_migration.h"
> > > >    #include "tick-internal.h"
> > > > @@ -1445,7 +1446,7 @@ static long tmigr_trigger_active(void
> > > > *unused)
> > > >       static int tmigr_cpu_unavailable(unsigned int cpu)
> > > >    {
> > > > -    struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> > > > +    struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> > > >        int migrator;
> > > >        u64 firstexp;
> > > >    @@ -1472,15 +1473,18 @@ static int
> > > > tmigr_cpu_unavailable(unsigned
> > > > int cpu)
> > > >       static int tmigr_cpu_available(unsigned int cpu)
> > > >    {
> > > > -    struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> > > > +    struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
> > > >           /* Check whether CPU data was successfully
> > > > initialized */
> > > >        if (WARN_ON_ONCE(!tmc->tmgroup))
> > > >            return -EINVAL;
> > > >    +    /* Isolated CPUs don't participate in timer migration
> > > > */
> > > > +    if (cpu_is_isolated(cpu))
> > > > +        return 0;
> > > There are two main sets of isolated CPUs used by
> > > cpu_is_isolated() -
> > > boot-time isolated CPUs via "isolcpus" and "nohz_full" boot
> > > command
> > > time
> > > options and runtime isolated CPUs via cpuset isolated partitions.
> > > The
> > > check for runtime isolated CPUs is redundant here as those CPUs
> > > won't
> > > be
> > > passed to tmigr_cpu_available().
> > Since tmigr_cpu_available is shared between isolated and offline
> > CPUs,
> > I added this check also to make sure bringing an isolated CPU back
> > online won't make it available for tmigr.
> Good point, so the check is indeed needed.
> 
> > 
> > > So this call is effectively removing
> > > the boot time isolated CPUs away from the available cpumask
> > > especially
> > > during the boot up process. Maybe you can add some comment about
> > > this
> > > behavioral change.
> > > 
> > Do you mean I should make clear that the check in
> > tmigr_cpu_available
> > is especially meaningful at boot time (i.e. when CPUs are first
> > brought
> > online)?
> 
> The current timgr code doesn't look at boot time isolated CPUs. The 
> cpu_is_isolated() check skips those boot time isolated CPUs from the 
> mask. I think this should be noted.
> 

Right, will add a note about that.

Thanks again,
Gabriele


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ