[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <500D82AD.2040508@mellanox.com>
Date: Mon, 23 Jul 2012 19:58:21 +0300
From: Or Gerlitz <ogerlitz@...lanox.com>
To: Eric Dumazet <eric.dumazet@...il.com>, <roland@...nel.org>,
<davem@...emloft.net>, Christoph Lameter <cl@...ux.com>
CC: <linux-rdma@...r.kernel.org>, <erezsh@...lanox.com>,
Shlomo Pongratz <shlomop@...lanox.com>,
<netdev@...r.kernel.org>
Subject: Re: [PATCH net/for-next V1 1/1] IB/ipoib: break linkage to neighbouring
system
On 20/07/2012 18:49, Or Gerlitz wrote:
> On Thu, Jul 19, 2012 at 4:18 PM, Or Gerlitz<ogerlitz@...lanox.com> wrote:
>> From: Shlomo Pongratz<shlomop@...lanox.com>
>>
>> Dave Miller<davem@...emloft.net> provided a detailed description of why the
>> way IPoIB is using neighbours for its own ipoib_neigh struct is buggy:
> [...]
>
>> This patch aims to solve the race conditions found in the IPoIB driver.
>>
>> The patch breaks the connection between the core networking neighbour structure
>> and the ipoib_neigh structure. Except for avoiding the race, it allows to in
>> under a setup where SKBs carrying IP packets that don't have any associated
>> neighbour are transmitted through IPoIB.
>>
>> We add an ipoib_neigh hash table with 1024 buckets. The hash table key is the destin
>> hardware address. Thus the ipoib_neigh is fetched from the hash table and not
>> dereferenced from the stashed location at the neighbour structure. The hash table uses
>> both RCU and reference count mechanisms to guarantee that no ipoib_neigh instance is
>> ever deleted while in use.
>>
>> Fetching the ipoib_neigh structure instance from the hash also makes the special
>> code in ipoib_start_xmit that handles remote and local bonding failover redundant.
>>
>> Aged ipoib_neigh instances are deleted by a garbage collection task that runs every
>> 30 seconds and deletes every ipoib_neigh instance that was idle for at least 60
>> seconds. The deletion is safe since the ipoib_neigh instances are protected
>> using RCU and reference count mechanisms.
>
> Hi Dave, Roland, Eric
>
> So how does this look? in the right direction? anything that need to be fixed?
Sorry for the possible spam, but resending (last time forgot to put Eric
on the "to" list so he might missed it, also added Christoph) - this is
a fix for very long time bug in IPoIB and we do want it to be reviewed
&& and hopefully accepted, or if needed get feedback and fix/change.
So, any further feedback? there was one feedback on V0, not to use read
lock for RCU protected
hash table lookup, and it was addressed in V1.
Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists