lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090324.143622.186562202.davem@davemloft.net>
Date:	Tue, 24 Mar 2009 14:36:22 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	herbert@...dor.apana.org.au
Cc:	mingo@...e.hu, r.schwebel@...gutronix.de,
	torvalds@...ux-foundation.org, blaschka@...ux.vnet.ibm.com,
	tglx@...utronix.de, a.p.zijlstra@...llo.nl,
	linux-kernel@...r.kernel.org, kernel@...gutronix.de
Subject: Re: Revert "gro: Fix legacy path napi_complete crash",

From: Herbert Xu <herbert@...dor.apana.org.au>
Date: Tue, 24 Mar 2009 23:09:28 +0800

> On Tue, Mar 24, 2009 at 03:39:42PM +0100, Ingo Molnar wrote:
> >
> > Subject: [PATCH] net: Fix netpoll lockup in legacy receive path
> 
> Actually, this patch is still racy.  If some interrupt comes in
> and we suddenly get the maximum amount of backlog we can still
> hang when we call __napi_complete incorrectly.  It's unlikely
> but we certainly shouldn't allow that.  Here's a better version.
> 
> net: Fix netpoll lockup in legacy receive path

Hmmm...

> @@ -2588,9 +2588,10 @@ static int process_backlog(struct napi_struct *napi, int quota)
>  		local_irq_disable();
>  		skb = __skb_dequeue(&queue->input_pkt_queue);
>  		if (!skb) {
> +			list_del(&napi->poll_list);
> +			clear_bit(NAPI_STATE_SCHED, &napi->state);
>  			local_irq_enable();
> -			napi_complete(napi);
> -			goto out;
> +			break;
>  		}
>  		local_irq_enable();

I think the problem is that we need to do the GRO flush before the
list delete and clearing the NAPI_STATE_SCHED bit.

You can't disown the NAPI context until you've squared away the GRO
state, I think.

Ingo's case stresses TCP a lot so I think he's hitting these GRO
cases a lot as well as hitting the backlog maximum.

So this mis-ordering of completion operations could explain why
he still sees problems.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ