linux-kernel - Re: [PATCH 00/37] softirq: Per vector masking v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190301034536.GA19200@lenoir>
Date:   Fri, 1 Mar 2019 04:45:37 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        "David S . Miller" <davem@...emloft.net>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        Ingo Molnar <mingo@...nel.org>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 00/37] softirq: Per vector masking v3

On Thu, Feb 28, 2019 at 09:33:15AM -0800, Linus Torvalds wrote:
> On Thu, Feb 28, 2019 at 9:12 AM Frederic Weisbecker <frederic@...nel.org> wrote:
> >
> > So this set should hopefully address all reviews from the v2, and
> > fix all reports from the extremely useful (as always) Kbuild testing
> > bot. It also completes support for all archs.
> 
> The one thing I'd still like to see is some actual performance
> (latency?) numbers.
> 
> Maybe they were hiding somewhere in the pile and my quick scan missed
> them. But the main argument for this was that we've had the occasional
> latency issues with softirqs blocking (eg the USB v4l frame dropping
> etc), and I did that SOFTIRQ_NOW_MASK because it helped one particular
> case.
> 
> And you don't seem to have removed that hack, and I'd really like to
> see that that thing isn't needed any more.
> 
> Because otherwise the whole series seems a bit pointless, don't you
> think? If it doesn't fix that fundamental issue, then what's the point
> of all this churn..

Numbers are indeed missing. In fact this patchset mostly just brings an
infrastructure. We have yet to pinpoint the most latency-inducing
softirq disabled sites and make them disable only the vectors that
are involved in a given lock.

And last but not least, this patchset allows us to soft-interrupt
code that disabled other vectors but it doesn't yet allow us to
soft-interrupt a vector itself. Not much is needed to allow that
from the softirq core code. But we can't do that blindly. For example
TIMER_SOFTIRQ, HRTIMER_SOFTIRQ, TASKLET_SOFTIRQ, NET_RX_SOFTIRQ
can't interrupt each others because some locks can be taken on all
of them (the socket lock for example). Although so many vectors
involved for a single lock is probably rare but still...

The only solution I see to make vectors interruptible is to proceed
the same way as we do for softirq disabled sections: proceed case
by case on a per handler basis. Hopefully we can operate per subsystem
and we don't need to start from drivers.

So the idea is the following: if the lock A can be taken from both TIMER_SOFTIRQ
and BLOCK_SOFTIRQ, we do this from the timer handler for example:

         __do_softirq() {
	     // all vectors disabled
	     run_timers {
	         random_timer_callback() {
                     bh = local_bh_enable_mask(~(TIMER_SOFTIRQ | BLOCK_SOFTIRQ));
                     spin_lock(&A);
                     do_some_work();
                     spin_unlock(&A);
                     local_bh_disable_mask(bh);
		 }
            }
         }

Sounds tedious but that's the only way I can imagine to make that correct.

Another way could be for locks to piggyback the vectors they are involved in
on initialization:

DEFINE_SPINLOCK_SOFTIRQ(A, TIMER_SOFTIRQ | BLOCK_SOFTIRQ);

Then callsites can just use:

    bh = spin_lock_softirq(A);
    ....
    spin_unlock_softirq(A, bh);

Then the lock function always arrange to only disable TIMER_SOFTIRQ | BLOCK_SOFTIRQ
if not nesting, whether we are in a vector or not. The only drawback is for the
relevant spin_lock_t to carry those init flags.

> 
> See commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"),
> which also has a couple of people listed who could hopefully re-test
> the v4l latency thing with whatever USB capture dongle it was that
> showed the issue.

So in this case for example, I'll need to check the callbacks involved
and make them disable only the vectors that need to be disabled.

I should try to reproduce the issue myself.

Thanks.