netdev - Re: [PATCH] gianfar: Fix crashes on RX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 26 Oct 2010 10:42:57 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	jarkao2@...il.com
Cc:	eric.dumazet@...il.com, eminak71@...il.com,
	akpm@...ux-foundation.org, netdev@...r.kernel.org,
	bugzilla-daemon@...zilla.kernel.org,
	bugme-daemon@...zilla.kernel.org, avorontsov@...sta.com,
	afleming@...escale.com
Subject: Re: [PATCH] gianfar: Fix crashes on RX path

From: Jarek Poplawski <jarkao2@...il.com>
Date: Fri, 22 Oct 2010 08:52:48 +0000

> On Fri, Oct 22, 2010 at 06:52:31AM +0000, Jarek Poplawski wrote:
>> On Fri, Oct 22, 2010 at 08:11:57AM +0200, Eric Dumazet wrote:
> ...
>> > Gianfar claims to be multiqueue, but only one cpu can run gfar_poll()
>> > and call gfar_clean_tx_ring() / gfar_clean_rx_ring()
>> > 
>> > If not, there would be more bugs than only rx_recycle thing
>> 
>> I didn't find what prevents running gfar_poll on many cpus and don't
>> claim there is no more bugs around.
> 
> On the other hand, I don't see your point in the code below either.
> These're only per gfargrp queues - not per device, aren't they?

I am still not at the point where I feel confortable applying this bug
fix, in fact I am very far from that.

None of the logic is consistent in what we are saying causes the
problem.

Anything that would make the RX recycling code racy and corrupt
recycling queue of the gianfar driver, would also corrupt all of the
other RX side and other driver state.

The NAPI state is unary for gianfar, and inside of that singular
->poll() instance it iterates over the queues.

The only asynchronous path could be netpoll, but again that would
break lots of other things.

I want to be shown a code path that results in the recycle SKB
queue getting accessed in parallel on two cpus without protection
so we can understand what it is that we are fixing.

On another note, I also agree that removing this RX recycling crud
wouldn't be a bad idea.  In addition to the MTU changing concerns Eric
Dumazet brought up, there are many other (less broken) ways to achieve
whatever performance gains recycling gives these devices.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html