netdev - Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2eb9af65d098bb54ed54178d7269e7197d6de5a0.camel@gmail.com>
Date: Thu, 21 Sep 2023 12:41:33 +0200
From: Ferenc Fejes <primalgamer@...il.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Paolo Abeni
	 <pabeni@...hat.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org, "David S. Miller"
	 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
	 <kuba@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Thomas Gleixner
	 <tglx@...utronix.de>, Wander Lairson Costa <wander@...hat.com>
Subject: Re: [RFC PATCH net-next 1/2] net: Use SMP threads for backlog NAPI.

Hi!

On Wed, 2023-09-20 at 17:57 +0200, Sebastian Andrzej Siewior wrote:
> On 2023-08-23 15:35:41 [+0200], Paolo Abeni wrote:
> > On Mon, 2023-08-14 at 11:35 +0200, Sebastian Andrzej Siewior wrote:
> > > @@ -4781,7 +4733,7 @@ static int enqueue_to_backlog(struct
> > > sk_buff *skb, int cpu,
> > >  		 * We can use non atomic operation since we own
> > > the queue lock
> > >  		 */
> > >  		if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd-
> > > >backlog.state))
> > > -			napi_schedule_rps(sd);
> > > +			__napi_schedule_irqoff(&sd->backlog);
> > >  		goto enqueue;
> > >  	}
> > >  	reason = SKB_DROP_REASON_CPU_BACKLOG;
> > 
> > I *think* that the above could be quite dangerous when cpu ==
> > smp_processor_id() - that is, with plain veth usage.
> > 
> > Currently, each packet runs into the rx path just after
> > enqueue_to_backlog()/tx completes.
> > 
> > With this patch there will be a burst effect, where the backlog
> > thread
> > will run after a few (several) packets will be enqueued, when the
> > process scheduler will decide - note that the current CPU is
> > already
> > hosting a running process, the tx thread.
> > 
> > The above can cause packet drops (due to limited buffering) or very
> > high latency (due to long burst), even in non overload situation,
> > quite
> > hard to debug.
> > 
> > I think the above needs to be an opt-in, but I guess that even RT
> > deployments doing some packet forwarding will not be happy with
> > this
> > on.
> 
> I've been looking at this again and have been thinking what you said
> here. I think part of the problem is that we lack a policy/ mechanism
> when a DoS is happening and what to do.
> 
> Before commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its
> job"") when a lot of network packets are processed then processing is
> moved to ksoftirqd and continues based on how the scheduler schedules
> the SCHED_OTHER ksoftirqd task. This avoids lock-ups of the system
> and
> it can do something else in between. Any interrupt will not continue
> the
> outstanding softirq backlog but wait for ksoftirqd. So it basically
> avoids the networking overload. It throttles the throughput if
> needed.
> 
> This isn't the case after that commit. Now, the CPU can be stuck with
> processing networking packets if the packets come in fast enough.
> Even
> if ksoftirqd is woken up, the next interrupt (say the timer) will
> continue with at least one round.
> By using NAPI-threads it is possible to give the control back to the
> scheduler which can throttle the NAPI processing in favour of other
> threads that ask for CPU. As you pointed out, waking the thread does
> not
> guarantee that it will immediately do the NAPI work. It can be
> delayed
> based on current load on the system.
> 
> This could be influenced by assigning the NAPI-thread a SCHED_FIFO
> priority. Based on the priority it could be ensured that the thread
> starts right away or "later" if something else is more important.
> However, this opens the DoS window again: The scheduler will put the
> NAPI thread on CPU as long as it asks for it with no throttling.
> 
> If we could somehow define a DoS condition once we are overwhelmed
> with
> packets, then we could act on it and throttle it. This in turn would
> allow a SCHED_FIFO priority without the fear of a lockup if the
> system
> is flooded with packets.

Can this be avoided if we reuse gro_flush_timeout as the maximum time
the NAPI thread can be scheduled?

> 
> > Cheers,
> > 
> > Paolo
> 
> Sebastian

Ferenc