linux-kernel - Re: [PATCH 2/3] softirq: avoid spurious stalls due to need

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZAXVF0tPKLErAkpT@lothringen>
Date:   Mon, 6 Mar 2023 12:57:11 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Jakub Kicinski <kuba@...nel.org>, peterz@...radead.org,
        jstultz@...gle.com, edumazet@...gle.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH 2/3] softirq: avoid spurious stalls due to need_resched()

On Sun, Mar 05, 2023 at 09:43:23PM +0100, Thomas Gleixner wrote:
> That said, I have no brilliant solution for that off the top of my head,
> but I'm not comfortable with applying more adhoc solutions which are
> contrary to the efforts of e.g. the audio folks.
> 
> I have some vague ideas how to approach that, but I'm traveling all of
> next week, so I neither will be reading much email, nor will I have time
> to think deeply about softirqs. I'll resume when I'm back.

IIUC: the problem is that some (rare?) softirq vector callbacks rely on the
fact they can not be interrupted by other local vectors and they rely on
that to protect against concurrent per-cpu state access, right?

And there is no automatic way to detect those cases otherwise we would have
fixed them all with spinlocks already.

So I fear the only (in-)sane idea I could think of is to do it the same way
we did with the BKL. Some sort of pushdown: vector callbacks known for having
no such subtle interaction can re-enable softirqs.

For example known safe timers (either because they have no such interactions
or because they handle them correctly via spinlocks) can carry a
TIMER_SOFTIRQ_SAFE flag to tell about that. And RCU callbacks something alike.

Of course this is going to be a tremendous amount of work but it has the
advantage of being iterative and it will pay in the long run. Also I'm confident
that the hottest places will be handled quickly. And most of them are likely to
be in core networking code.

Because I fear no hack will ever fix that otherwise, and we have tried a lot.

Thanks.