linux-kernel - Re: [PATCH net] igb: fix netpoll exit with traffic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <1682283.zdkM9JWR0q@spock>
Date:   Fri, 09 Jul 2021 16:05:18 +0200
From:   Oleksandr Natalenko <oleksandr@...alenko.name>
To:     Jesse Brandeburg <jesse.brandeburg@...el.com>
Cc:     intel-wired-lan <intel-wired-lan@...ts.osuosl.org>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
        Alexander Duyck <alexanderduyck@...com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] igb: fix netpoll exit with traffic

On sobota 8. května 2021 12:26:36 CEST Oleksandr Natalenko wrote:
> Hello.
> 
> On Thu, May 06, 2021 at 07:30:01PM -0700, Jesse Brandeburg wrote:
> > Oleksandr brought a bug report where netpoll causes trace messages in
> > the log on igb.
> > 
> > [22038.710800] ------------[ cut here ]------------
> > [22038.710801] igb_poll+0x0/0x1440 [igb] exceeded budget in poll
> > [22038.710802] WARNING: CPU: 12 PID: 40362 at net/core/netpoll.c:155
> > netpoll_poll_dev+0x18a/0x1a0
> > 
> > After some discussion and debug from the list, it was deemed that the
> > right thing to do is initialize the clean_complete variable to false
> > when the "netpoll mode" of passing a zero budget is used.
> > 
> > This logic should be sane and not risky because the only time budget
> > should be zero on entry is netpoll.  Change includes a small refactor
> > of local variable assignments to clean up the look.
> > 
> > Fixes: 16eb8815c235 ("igb: Refactor clean_rx_irq to reduce overhead and
> > improve performance") Reported-by: Oleksandr Natalenko
> > <oleksandr@...alenko.name>
> > Suggested-by: Alexander Duyck <alexander.duyck@...il.com>
> > Signed-off-by: Jesse Brandeburg <jesse.brandeburg@...el.com>
> > Reviewed-by: Alexander Duyck <alexanderduyck@...com>
> > ---
> > 
> > Compile tested ONLY, but functionally it should be exactly the same for
> > all cases except when budget is zero on entry, which will hopefully fix
> > the bug.
> > 
> > Sending this through intel-wired-lan with Alex's ACK.
> > ---
> > 
> >  drivers/net/ethernet/intel/igb/igb_main.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> > b/drivers/net/ethernet/intel/igb/igb_main.c index
> > 0cd37ad81b4e..b0a9bed14071 100644
> > --- a/drivers/net/ethernet/intel/igb/igb_main.c
> > +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> > @@ -7991,12 +7991,16 @@ static void igb_ring_irq_enable(struct
> > igb_q_vector *q_vector)> 
> >   **/
> >  
> >  static int igb_poll(struct napi_struct *napi, int budget)
> >  {
> > 
> > -	struct igb_q_vector *q_vector = container_of(napi,
> > -						     struct igb_q_vector,
> > -						     napi);
> > -	bool clean_complete = true;
> > +	struct igb_q_vector *q_vector;
> > +	bool clean_complete;
> > 
> >  	int work_done = 0;
> > 
> > +	/* if budget is zero, we have a special case for netconsole, so
> > +	 * make sure to set clean_complete to false in that case.
> > +	 */
> > +	clean_complete = !!budget;
> > +
> > +	q_vector = container_of(napi, struct igb_q_vector, napi);
> > 
> >  #ifdef CONFIG_IGB_DCA
> >  
> >  	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
> >  	
> >  		igb_update_dca(q_vector);
> 
> This didn't fix the issue neither for me, nor for another person from
> the kernel bugzilla [1].
> 
> The same printout still happens:
> 
> ```
> May 07 20:26:55 spock kernel: igb_poll+0x0/0x1440 [igb] exceeded budget in
> poll May 07 20:26:55 spock kernel: WARNING: CPU: 13 PID: 12285 at
> net/core/netpoll.c:154 netpoll_poll_dev+0x18a/0x1a0 …
> May 07 20:26:55 spock kernel: Call Trace:
> May 07 20:26:55 spock kernel:  netpoll_send_skb+0x1a0/0x260
> May 07 20:26:55 spock kernel:  write_msg+0xe5/0x100 [netconsole]
> May 07 20:26:55 spock kernel:  console_unlock+0x42f/0x720
> May 07 20:26:55 spock kernel:  suspend_devices_and_enter+0x2ac/0x7f0
> May 07 20:26:55 spock kernel:  pm_suspend.cold+0x321/0x36c
> May 07 20:26:55 spock kernel:  state_store+0xa6/0x140
> May 07 20:26:55 spock kernel:  kernfs_fop_write_iter+0x124/0x1b0
> May 07 20:26:55 spock kernel:  new_sync_write+0x16a/0x200
> May 07 20:26:55 spock kernel:  vfs_write+0x223/0x2e0
> May 07 20:26:55 spock kernel:  __x64_sys_write+0x6d/0xf0
> ```
> 
> Probably, your patch is still fine, but another idea is desperately
> needed :).
> 
> Thanks.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=212573

Gentle ping. IIUC, the patch was not picked up anyway, but maybe there's 
another idea?

Thanks.

-- 
Oleksandr Natalenko (post-factum)