[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x6d2ae4ss4wvvuib2hmop6ztysjsbyno7gbjkyek5xng2kmgyd@yfmnfognlj5n>
Date: Mon, 3 Jul 2023 18:15:58 -0300
From: Wander Lairson Costa <wander@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Paolo Abeni <pabeni@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
juri.lelli@...hat.com
Subject: Re: Splat in kernel RT while processing incoming network packets
On Mon, Jul 03, 2023 at 04:29:08PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-07-03 09:47:26 [-0300], Wander Lairson Costa wrote:
> > Dear all,
> Hi,
>
> > I am writing to report a splat issue we encountered while running the
> > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet
> > Steering).
> >
> > During some testing of the RT kernel version 6.4.0 with Network RPS enabled,
> > we observed a splat occurring in the SoftIRQ subsystem. The splat message is as
> > follows:
> >
> > [ 37.168920] ------------[ cut here ]------------
> > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60
> …
> > [ 37.169060] ---[ end trace 0000000000000000 ]---
> >
> > It comes from [1].
> >
> > The issue lies in the mechanism of RPS to defer network packets processing to
> > other CPUs. It sends an IPI to the to the target CPU. The registered callback
> > is rps_trigger_softirq, which will raise a softirq, leading to the following
> > scenario:
> >
> > CPU0 CPU1
> > | netif_rx() |
> > | | enqueue_to_backlog(cpu=1) |
> > | | | net_rps_send_ipi() |
> > | | flush_smp_call_function_queue()
> > | | | was_pending = local_softirq_pending()
> > | | | __flush_smp_call_function_queue()
> > | | | rps_trigger_softirq()
> > | | | | __raise_softirq_irqoff()
> > | | | do_softirq_post_smp_call_flush()
> >
> > That has the undesired side effect of raising a softirq in a function call,
> > leading to the aforementioned splat.
>
> correct.
>
> > The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the
> > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can
> > provide the .config.
>
> It is fine, I see it.
>
> > The only solution I imagined so far was to modify RPS to process packtes in a
> > kernel thread in RT. But I wonder how would be that be different than processing
> > them in ksoftirqd.
> >
> > Any inputs on the issue?
>
> Not sure how to proceed. One thing you could do is a hack similar like
> net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd.
At first sight it seems straightforward to implement.
> On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and
> remove the warning because we have now commit
> d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"")
But I am more in favor of a solution that removes code than one that
adds more :)
>
> Prior that, raising softirq from hardirq would wake ksoftirqd which in
> turn would collect all pending softirqs. As a consequence all following
> softirqs (networking, …) would run as SCHED_OTHER and compete with
> SCHED_OTHER tasks for resources. Not good because the networking work is
> no longer processed within the networking interrupt thread. Also not a
> DDoS kind of situation where one could want to delay processing.
>
> With that change, this isn't the case anymore. Only an "unrelated" IRQ
> thread could pick up the networking work which is less then ideal. That
> is because the global softirq set is added, ksoftirq is marked for a
> wakeup and could be delayed because other tasks are busy. Then the disk
> interrupt (for instance) could pick it up as part of its threaded
> interrupt.
>
> Now that I think about, we could make the backlog pseudo device a
> thread. NAPI threading enables one thread but here we would need one
> thread per-CPU. So it would remain kind of special. But we would avoid
> clobbering the global state and delay everything to ksoftird. Processing
> it in ksoftirqd might not be ideal from performance point of view.
Before sending this to the ML, I talked to Paolo about using NAPI
thread. He explained that it is implemented per interface. For example,
for this specific case, it happened on the loopback interface, which
doesn't implement NAPI. I am cc'ing him, so the can correct me if I am
saying something wrong.
>
> > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306
> >
> > Cheers,
> > Wander
>
> Sebastian
>
Powered by blists - more mailing lists