lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200806114545.GA2674@hirez.programming.kicks-ass.net>
Date:   Thu, 6 Aug 2020 13:45:45 +0200
From:   peterz@...radead.org
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Vladimir Oltean <olteanv@...il.com>,
        Kurt Kanzenbach <kurt.kanzenbach@...utronix.de>,
        Alison Wang <alison.wang@....com>, catalin.marinas@....com,
        will@...nel.org, paulmck@...nel.org, mw@...ihalf.com,
        leoyang.li@....com, vladimir.oltean@....com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Anna-Maria Gleixner <anna-maria@...utronix.de>
Subject: Re: [RFC PATCH] arm64: defconfig: Disable fine-grained task level
 IRQ time accounting

On Thu, Aug 06, 2020 at 11:41:06AM +0200, Thomas Gleixner wrote:
> peterz@...radead.org writes:
> > On Wed, Aug 05, 2020 at 02:56:49PM +0100, Valentin Schneider wrote:
> >
> >> I've been tempted to say the test case is a bit bogus, but am not familiar
> >> enough with the RT throttling details to stand that ground. That said, from
> >> both looking at the execution and the stress-ng source code, it seems to
> >> unconditionally spawn 32 FIFO-50 tasks (there's even an option to make
> >> these FIFO-99!!!), which is quite a crowd on monoCPU systems.
> >
> > Oh, so it's a case of: we do stupid without tuning and the system falls
> > over. I can live with that.
> 
> It's not a question of whether you can live with that behaviour for a
> particular silly test case.
> 
> The same happens with a single RT runaway task with enough interrupt
> load on a UP machine. Just validated that. 

Of course.

> And that has nothing to do
> with a silly test case. Sporadic runaways due to a bug in a once per
> week code path simply can happen and having the safety net working
> depending on a config option selected or not is just wrong.

The safety thing is concerned with RT tasks. It doesn't pretend to help
with runnaway IRQs, never has, never will.

The further extreme is an interrupt storm, those have always taken a
machine down.

Accounting unrelated IRQ time to RT tasks is equally wrong, the task
execution is unrelated to the IRQs. The config option at least offers
insight into where time goes -- and it's a config option because doing
time accounting on interrupts adds overhead :/

This really is a no-win all round.

The only 'sensible' option here is threaded IRQs, where the IRQ line
gets disabled until the handler thread has ran, that also helps with IRQ
storms.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ