lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32a536f5688105df515e6ad9fd12fbcdbd781afb.camel@redhat.com>
Date:   Mon, 03 May 2021 16:57:24 -0500
From:   Scott Wood <swood@...hat.com>
To:     Mike Galbraith <efault@....de>,
        Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Valentin Schneider <valentin.schneider@....com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 0/3] newidle_balance() PREEMPT_RT latency mitigations

On Mon, 2021-05-03 at 20:52 +0200, Mike Galbraith wrote:
> On Mon, 2021-05-03 at 11:33 -0500, Scott Wood wrote:
> > On Sun, 2021-05-02 at 05:25 +0200, Mike Galbraith wrote:
> > > If NEWIDLE balancing migrates one task, how does that manage to
> > > consume
> > > a full *millisecond*, and why would that only be a problem for RT?
> > > 
> > > 	-Mike
> > > 
> > > (rt tasks don't play !rt balancer here, if CPU goes idle, tough titty)
> > 
> > Determining which task to pull is apparently taking that long (again,
> > this is on a 128-cpu system).  RT is singled out because that is the
> > config that makes significant tradeoffs to keep latencies down (I
> > expect this would be far from the only possible 1ms+ latency on a
> > non-RT kernel), and there was concern about the overhead of a double
> > context switch when pulling a task to a newidle cpu.
> 
> What I think has be going on is that you're running a synchronized RT
> load, many CPUs go idle as a thundering herd, and meet at focal point
> busiest.  What I was alluding to was that preventing such size scale
> pile-ups would be way better than poking holes in it for RT to try to
> sneak through.  If pile-up it is, while not particularly likely, the
> same should happen with normal tasks, wasting cycles generating heat.
> 
> The main issue I see with these patches is that the resulting number is
> still so gawd awful as to mean "nope, not rt ready", making the whole
> exercise look a bit like a noop.

It doesn't look like rteval asks cyclictest to synchronize, but
regardless, how is this "poking holes"?  Making sure interrupts are
enabled during potentially long-running activities is pretty fundamental
to PREEMPT_RT.  What specifically is your suggestion?

And yes, 317 us is still not a very good number for PREEMPT_RT, but
progress is progress.  It's hard to address the moderate latency spikes
if they're obscured by large latency spikes.  One also needs to have
realistic expectations when it comes to RT on large systems, particularly
when not isolating the latency-sensitive CPUs.

-Scott


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ