[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120228.154657.1817512578346429850.davem@davemloft.net>
Date: Tue, 28 Feb 2012 15:46:57 -0500 (EST)
From: David Miller <davem@...emloft.net>
To: cascardo@...ux.vnet.ibm.com
Cc: yevgenyp@...lanox.co.il, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, roland@...estorage.com,
jackm@....mellanox.co.il
Subject: Re: [PATCH] mlx4: prevent the device from being removed
concurrently
From: Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com>
Date: Tue, 28 Feb 2012 17:34:38 -0300
> On Tue, Feb 28, 2012 at 02:30:51PM -0500, David Miller wrote:
>> From: Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com>
>> Date: Tue, 28 Feb 2012 15:36:16 -0300
>>
>> > When a EEH happens, the catas poll code will try to restart the device,
>> > removing it and adding it back again. The EEH code will try to do the
>> > same. One of the threads ends up accessing memory that was freed by the
>> > other thread and we get a crash.
>>
>> Stop adding bandaids to the locking.
>>
>> If the EEH infrastructure doesn't synchronize parallel operations
>> on the same device, that is the real bug, and that's where the real
>> fix belongs.
>>
>> I refuse to apply this patch.
>>
>
> It's not EEH that does not synchronize removal. The problem is that the
> driver itself calls the driver remove function through mlx4_restart_one.
Then reuse the existing intf_mutex this driver has, export it to
main.c and add a new __mlx4_unregister_device that can be called
with the intf_mutex held already.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists