[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy0quVx01Q02fRQw@LQ3V64L9R2>
Date: Thu, 7 Nov 2024 13:01:45 -0800
From: Joe Damato <jdamato@...tly.com>
To: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
	corbet@....net, hdanton@...a.com, bagasdotme@...il.com,
	pabeni@...hat.com, namangulati@...gle.com, edumazet@...gle.com,
	amritha.nambiar@...el.com, sridhar.samudrala@...el.com,
	sdf@...ichev.me, peter@...eblog.net, m2shafiei@...terloo.ca,
	bjorn@...osinc.com, hch@...radead.org, willy@...radead.org,
	willemdebruijn.kernel@...il.com, skhawaja@...gle.com,
	Martin Karsten <mkarsten@...terloo.ca>,
	"David S. Miller" <davem@...emloft.net>,
	Simon Horman <horms@...nel.org>, David Ahern <dsahern@...nel.org>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Lorenzo Bianconi <lorenzo@...nel.org>,
	Alexander Lobakin <aleksander.lobakin@...el.com>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next v6 2/7] net: Suspend softirq when
 prefer_busy_poll is set
On Wed, Nov 06, 2024 at 07:24:32PM -0800, Joe Damato wrote:
> On Wed, Nov 06, 2024 at 03:31:00PM -0800, Jakub Kicinski wrote:
> > On Wed, 6 Nov 2024 08:52:00 -0800 Joe Damato wrote:
> > > On Tue, Nov 05, 2024 at 09:03:38PM -0800, Jakub Kicinski wrote:
> > > > On Mon,  4 Nov 2024 21:55:26 +0000 Joe Damato wrote:  
[...]
> > 0 epoll
> > 1   # ..does its magic..
> > 2   __napi_busy_loop() 
> > 3     # finds a waking packet
> > 4     busy_poll_stop()
> > 5       # arms the timer for long suspend
> > 6   # epoll sees events
> > 7     ep_suspend_napi_irqs()
> > 8       napi_suspend_irqs()
> > 9         # arms for long timer again
> > 
> > The timer we arm here only has to survive from line 5 to line 9,
> > because we will override the timeout on line 9.
> 
> Yes, you are right. Thanks for highlighting and catching this.
> 
> > > The overall point to make is that: the suspend timer is used to
> > > prevent misbehaving userland applications from taking too long. It's
> > > essentially a backstop and, as long as the app is making forward
> > > progress, allows the app to continue running its busy poll loop
> > > undisturbed (via napi_complete_done preventing the driver from
> > > enabling IRQs).
> > > 
> > > Does that make sense?
> > 
> > My mental model put in yet another way is that only epoll knows if it
> > has events, and therefore whether the timeout should be short or long.
> > So the suspend timer should only be applied by epoll.
> 
> Here's what we are thinking, can you let me know if you agree with
> this?
> 
>   - We can drop patch 2 entirely
>   - Update the documentation about IRQ suspension as needed now
>     that patch 2 has been dropped
>   - Leave the rest of the series as is
>   - Re-run our tests to gather sufficient data for the test cases
>     outlined in the cover letter to ensure that the performance
>     numbers hold over several iterations
> 
> Does that seem reasonable for the v7 to you?
> 
> I am asking because generating the amount of data over the number of
> scenarios we are testing takes a long time and I want to make sure
> we are as aligned as we can be before I kick off another run :)
I just noticed this was marked "changes requested". I re-ran the
tests overnight and have the data to confirm results are the same
even after dropping patch 2, which simplifies the code and removes
the double arming of the timer.
I wasn't sure if you were asking for other changes other than
dropping patch 2, but since I have the data I'm going to proceed as
specified in my previous email above:
  - Drop patch 2
  - Update cover letter with new data
  - Send that as v7
Unless you'd like me to hold off for some reason? Or if there was
some other feedback I need to address that I am missing?
Powered by blists - more mailing lists
 
