netdev - Re: [PATCH] gianfar: Fix crashes on RX path (Was Re: [Bugme-new] [Bug 19692] New: linux-2.6.36-rc5 crash with gianfar ethernet at full line rate traffic)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1287727917.9059.117.camel@edumazet-laptop>
Date:	Fri, 22 Oct 2010 08:11:57 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Jarek Poplawski <jarkao2@...il.com>
Cc:	David Miller <davem@...emloft.net>, emin ak <eminak71@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	netdev@...r.kernel.org, bugzilla-daemon@...zilla.kernel.org,
	bugme-daemon@...zilla.kernel.org,
	Anton Vorontsov <avorontsov@...sta.com>,
	Andy Fleming <afleming@...escale.com>
Subject: Re: [PATCH] gianfar: Fix crashes on RX path (Was Re: [Bugme-new]
 [Bug 19692] New: linux-2.6.36-rc5 crash with gianfar ethernet at full line
 rate traffic)

Le mardi 19 octobre 2010 à 10:06 +0000, Jarek Poplawski a écrit :
> On Tue, Oct 19, 2010 at 09:44:33AM +0300, emin ak wrote:
> > Hi Jarek;
> > After 5 days and more then 20 billion packets passed without crash, it
> > seems that this patch is working for me, at least for crash type 2.
> > (For type 1, it only occured once and I can never reproduce this
> > again, but still trying. I think with this patch is also lowers the
> > risk for type 1.
> 
> It would be interesting to have a look if it's exactly type 1, because
> skb_over_panic can happen for different reasons, e.g. like here:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=63b88b9041ceef8217f34de71a2e96f0c3f0fd3b
> 
> > For adding a new bug entry for skb_over_panic, before that I think I
> > must find a reliable way to make this type of crash reproducable,
> > otherwise I don't know how to test it if it solved or not.
> 
> Maybe for now let's try to get and see this type 1 again? Since the
> recycle path is suspicious a bit to me, probably limiting memory or
> slowing tx (maybe different mtus on eth0 and 1) under heavy multi cpu
> load might help.
> 
> > Lastly, thanks a lot for your valuable help to overcome this problem
> > and also is there anything that I can do  for testing / commiting this
> > patch to mainline?
> 
> Here it is for David to handle the rest.
> 
> Thanks a lot for such an intense testing,
> Jarek P.
> --------------------------->
> 
> The rx_recycle queue is global per device but can be accesed by many
> napi handlers at the same time, so it needs full skb_queue primitives
> (with locking). Otherwise, various crashes caused by broken skbs are
> possible.
> 
> This patch resolves, at least partly, bugzilla bug 19692. (Because of
> some doubts that there could be still something around which is hard
> to reproduce my proposal is to leave this bug opened for a month.)
> 
> Fixes commit: 0fd56bb5be6455d0d42241e65aed057244665e5e
> 
> Reported-by: emin ak <eminak71@...il.com>
> Tested-by: emin ak <eminak71@...il.com>
> Signed-off-by: Jarek Poplawski <jarkao2@...il.com>
> CC: Andy Fleming <afleming@...escale.com>
> ---
> diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> index 4f7c3f3..db47b55 100644
> --- a/drivers/net/gianfar.c
> +++ b/drivers/net/gianfar.c
> @@ -2515,7 +2515,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
>  				skb_recycle_check(skb, priv->rx_buffer_size +
>  					RXBUF_ALIGNMENT)) {
>  			gfar_align_skb(skb);
> -			__skb_queue_head(&priv->rx_recycle, skb);
> +			skb_queue_head(&priv->rx_recycle, skb);
>  		} else
>  			dev_kfree_skb_any(skb);
>  
> @@ -2598,7 +2598,7 @@ struct sk_buff * gfar_new_skb(struct net_device *dev)
>  	struct gfar_private *priv = netdev_priv(dev);
>  	struct sk_buff *skb = NULL;
>  
> -	skb = __skb_dequeue(&priv->rx_recycle);
> +	skb = skb_dequeue(&priv->rx_recycle);
>  	if (!skb)
>  		skb = gfar_alloc_skb(dev);
>  
> @@ -2754,7 +2754,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
>  			if (unlikely(!newskb))
>  				newskb = skb;
>  			else if (skb)
> -				__skb_queue_head(&priv->rx_recycle, skb);
> +				skb_queue_head(&priv->rx_recycle, skb);
>  		} else {
>  			/* Increment the number of packets */
>  			rx_queue->stats.rx_packets++;

Are you sure its needed at all ?

Gianfar claims to be multiqueue, but only one cpu can run gfar_poll()
and call gfar_clean_tx_ring() / gfar_clean_rx_ring()

If not, there would be more bugs than only rx_recycle thing

vi +2822 drivers/net/gianfar.c

                for_each_set_bit(i, &gfargrp->rx_bit_map, priv->num_rx_queues) {
                        if (test_bit(i, &serviced_queues))
                                continue;
                        rx_queue = priv->rx_queue[i];
                        tx_queue = priv->tx_queue[rx_queue->qindex];

                        tx_cleaned += gfar_clean_tx_ring(tx_queue);
                        rx_cleaned_per_queue = gfar_clean_rx_ring(rx_queue,
                                                        budget_per_queue);
                        rx_cleaned += rx_cleaned_per_queue;
                        if(rx_cleaned_per_queue < budget_per_queue) {
                                left_over_budget = left_over_budget +
                                        (budget_per_queue - rx_cleaned_per_queue);
                                set_bit(i, &serviced_queues);
                                num_queues--;
                        }
                }


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html