[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091112085739.1137f690@nehalam>
Date:	Thu, 12 Nov 2009 08:57:39 -0800
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Patrick McHardy <kaber@...sh.net>, netdev@...r.kernel.org
Subject: Re: [RACE] net: in process_backlog
On Thu, 12 Nov 2009 16:50:53 +0800
Changli Gao <xiaosuo@...il.com> wrote:
> Dear Stephen:
> 
> I don't think this change
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=6e583ce5242f32e925dcb198f7123256d0798370
> is correct.
> 
>                         local_irq_enable();
>                         break;
>                 }
> -
>                 local_irq_enable();
> 
> -               dev = skb->dev;
> -
> on MP system, flush_backlog() will be called here, and after that
> skb->dev will be invalid, if we access it, sth. unexpected may
> happens.
> 
>                 netif_receive_skb(skb);
> -
> -               dev_put(dev);
>         } while (++work < quota && jiffies == start_time);
> 
>         return work;
There is are a couple of issues here, but it is not what you thought
you saw.
The receive process is always done in soft IRQ context. The backlog queue's
are per-cpu. When a device is deleted an IPI is sent to all cpu's to
scan there backlog queue.  What should protect the skb is the fact that
the network device destruction process waits for an RCU grace period.
So skb->dev points to valid data. 
BUT the flush_backlog is run too late in the device destruction process.
It should be moved out of netdev_run_todo, to right after dev_shutdown().
Also adding a check for skb->dev->reg_state in netif_receive_skb would
be wise to drop packets.
-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists
 
