[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c5c8218e587941bc91a2cc49883235d0@AcuMS.aculab.com>
Date: Wed, 28 Sep 2022 16:19:51 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Qais Yousef' <qais.yousef@....com>
CC: John Stultz <jstultz@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
John Dias <joaodias@...gle.com>,
Connor O'Brien <connoro@...gle.com>,
"Rick Yiu" <rickyiu@...gle.com>, John Kacur <jkacur@...hat.com>,
Chris Redpath <chris.redpath@....com>,
Abhijeet Dharmapurikar <adharmap@...cinc.com>,
"Peter Zijlstra" <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
"kernel-team@...roid.com" <kernel-team@...roid.com>
Subject: RE: [RFC][PATCH v3 0/3] Softirq -rt Optimizations
From: Qais Yousef
> Sent: 28 September 2022 16:56
>
> On 09/28/22 13:51, David Laight wrote:
> > From: Qais Yousef
> > > Sent: 28 September 2022 14:01
> > >
> > > Hi John
> > >
> > > On 09/21/22 01:25, John Stultz wrote:
> > > > Hey all,
> > > >
> > > > This series is a set of patches that optimize scheduler decisions around
> > > > realtime tasks and softirqs. This series is a rebased and reworked set
> > > > of changes that have been shipping on Android devices for a number of
> > > > years, originally created to resolve audio glitches seen on devices
> > > > caused by softirqs for network or storage drivers.
> > > >
> > > > Long running softirqs cause issues because they aren’t currently taken
> > > > into account when a realtime task is woken up, but they will delay
> > > > realtime tasks from running if the realtime tasks are placed on a cpu
> > > > currently running a softirq.
> > >
> > > Thanks a lot for sending this series. I've raised this problem in various
> > > venues in the past, but it seems it is hard to do something better than what
> > > you propose here.
> > >
> > > Borrowing some behaviours from PREEMPT_RT (like threadedirqs) won't cut it
> > > outside PREEMPT_RT AFAIU.
> > >
> > > Peter did suggest an alternative at one point in the past to be more aggressive
> > > in limiting softirqs [1] but I never managed to find the time to verify it
> > > - especially its impact on network throughput as this seems to be the tricky
> > > trade-of (and tricky thing to verify for me at least). I'm not sure if BLOCK
> > > softirqs are as sensitive.
> >
> > I've had issues with the opposite problem.
> > Long running RT tasks stopping the softint code running.
> >
> > If an RT task is running, the softint will run in the context of the
> > RT task - so has priority over it.
> > If the RT task isn't running the softint stops the RT task being scheduled.
> > This is really just the same.
> >
> > If the softint defers back to thread context it won't be scheduled
> > until any RT task finishes. This is the opposite priority.
>
> If we can get a subset of threadedirqs (call it threadedsoftirqs) from
> PREEMPT_RT where softirqs can be converted into RT kthreads, that'll alleviate
> both sides of the problem IMO. But last I checked with Thomas this won't be
> possible. But things might have changed since then..
Part of the problem is that can significantly increase latency.
Some softirq calls will be latency sensitive.
> > IIRC there is another strange case where the RT thread has been woken
> > but isn't yet running - can't remember the exact details.
> >
> > I can (mostly) handle the RT task being delayed (there are a lot of RT
> > threads sharing the work) but it is paramount that the ethernet receive
> > code actually runs - I can't afford to drop packets (they contain audio
> > the RT threads are processing).
> >
> > In my case threaded NAPI (mostly) fixes it - provided the NAPI thread are RT.
>
> There's a netdev_budget and netdev_bugdet_usecs params in procfs that control
> how long the NET_RX spends in the softirq. Maybe you need to tweak those too.
> In your case, you probably want to increase the budget.
Maybe, but the problem is that the softint code is far too willing
to drop to kthread context.
Eric made a change to reduce that (to avoid losing ethernet packets)
but the original test got added back - there are now two tests, but
the original one dominates. Eric's bug fix got reverted (with extra
tests that make the code slower).
I did test with that changed, but still got some lost packets.
Trying to receive 500000 UDP packets/sec is quite hard!
They are also split across 10k unconnected sockets.
> Note that in Android the BLOCK layer seems to cause similar problems which
> don't have these NET facilities. So NET is only one side of the problem.
Isn't the block layer softints stopping other code?
I'd really got the other problem.
Although I do have a 10ms timer wakeup that really needs not to be delayed.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists