linux-kernel - Re: Splat in kernel RT while processing incoming network packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6cmltsaeqjcekglutpnp4lxyqnjng5m4w73ae4btwrivuxvptf@zmhahab7643i>
Date:   Wed, 5 Jul 2023 12:59:28 -0300
From:   Wander Lairson Costa <wander@...hat.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Paolo Abeni <pabeni@...hat.com>, linux-kernel@...r.kernel.org,
        linux-rt-users@...r.kernel.org, juri.lelli@...hat.com
Subject: Re: Splat in kernel RT while processing incoming network packets

On Tue, Jul 04, 2023 at 04:47:49PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-07-04 12:29:33 [+0200], Paolo Abeni wrote:
> > Just to hopefully clarify the networking side of it, napi instances !=
> > network backlog (used by RPS). The network backlog (RPS) is available
> > for all the network devices, including the loopback and all the virtual
> > ones. 
> 
> Yes.
> 
> > The napi instances (and the threaded mode) are available only on
> > network device drivers implementing the napi model. The loopback driver
> > does not implement the napi model, as most virtual devices and even
> > some H/W NICs (mostily low end ones).
> 
> Yes.
> 
> > The network backlog can't run in threaded mode: there is no API/sysctl
> > nor infrastructure for that. The backlog processing threaded mode could
> > be implemented, even if should not be completely trivial and it sounds
> > a bit weird to me.
> 
> Yes, I mean that this needs to be done.
> 
> > 
> > Just for the records, I mentioned the following in the bz:
> > 
> > It looks like flush_smp_call_function_queue() has 2 only callers,
> > migration, and do_idle().
> > 
> > What about moving softirq processing from
> > flush_smp_call_function_queue() into cpu_stopper_thread(), outside the
> > unpreemptable critical section?
> 
> This doesn't solve anything. You schedule softirq from hardirq and from
> this moment on you are in "anonymous context" and we solve this by
> processing it in ksoftirqd.
> For !RT you process it while leaving the hardirq. For RT, we can't.
> Processing it in the context of the currently running process (say idle
> as in the reported backtrace or an another running user task) would lead
> to processing network related that originated somewhere at someone
> else's expense. Assume you have a high prio RT task running, not related
> to networking at all, and suddenly you throw a bunch of skbs on it.
> 
> Therefore it is preferred to process them within the interrupt thread in
> which the softirq was raised/ within its origin.
> 
> The other problem with ksoftirqd processing is that everything is added
> to a global state and then left for ksoftirqd to process. The global
> state is considered by every local_bh_enable() instance so random
> interrupt thread could process it or even a random task doing a syscall
> involving spin_lock_bh().
> 
> The NAPI-threads are nice in a way that they don't clobber the global
> state.
> For RPS we would need either per-CPU threads or serve this in
> ksoftirqd/X. The additional thread per-CPU makes only sense if it runs
> at higher priority. However without the priority it would be no
> different to ksoftirqd unless it does only the backlog's work.
> 
> puh. I'm undecided here. We might want to throw it into ksoftirqd,
> remove the warning. But then this will be processed with other softirqs
> (like USB due to tasklet) and at some point and might be picked up by
> another interrupt thread.
> 

Maybe, under RT, some softirq should run in the context of the "target"
process. For NET_RX, for example, the softirq's would run in the context
of the packet recipient process. Each task_struct would have a list of
pending softirq, which would be checked in a few points, like on scheduling,
when the process enters in the kernel, softirq raise, etc. The default
target process would be ksoftirqd. Does this idea make sense?

> > Cheers,
> > 
> > Paolo
> 
> Sebastian
>