lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ3xEMi9G+tFsaANwndhmOZ78gt79WZ35Oq4CiFOm1WqxBXyqQ@mail.gmail.com>
Date:	Wed, 26 Nov 2014 00:00:31 +0200
From:	Or Gerlitz <gerlitz.or@...il.com>
To:	Gavin Shan <gwshan@...ux.vnet.ibm.com>
Cc:	Linux Netdev List <netdev@...r.kernel.org>,
	Amir Vadai <amirv@...lanox.com>,
	David Miller <davem@...emloft.net>,
	Wei Yang <weiyang@...ux.vnet.ibm.com>,
	Yishai Hadas <yishaih@...lanox.com>,
	Jack Morgenstein <jackm@....mellanox.co.il>
Subject: Re: [PATCH] net/mlx4: Fix EEH recovery failure

On Mon, Nov 24, 2014 at 11:55 PM, Gavin Shan <gwshan@...ux.vnet.ibm.com> wrote:
> On Mon, Nov 24, 2014 at 11:17:55PM +0200, Or Gerlitz wrote:
>>On Sat, Nov 22, 2014 at 12:56 PM, Gavin Shan <gwshan@...ux.vnet.ibm.com> wrote:
>>> The patch fixes couple of EEH recovery failures on PPC PowerNV
>>> platform:
>>
>>>    * Don't clear struct mlx4_priv instance in mlx4_pci_err_detected().
>>>      Otherwise, __mlx4_init_one() runs into kernel crash because
>>>      of dereferencing to NULL pointer.
>>
>>I don't see this change in the patch, I see no-clearing of mlx4_priv
>>in __mlx4_unload_one - please clarify, also is this patch
>>based/targeted on the net or net-next tree?
>>
>
> Yes, It would be: Don't clear struct mlx4_priv instance in mlx4_unload_one(),
> which is called by mlx4_pci_err_detected().


But the struct mlx4_priv instance is cleared in mlx4_unload_one() for
a reason, I suspect that you might made the EEH callback to work, but
broke something else... e.g did you made sure that kexec works after
your changes as it did before?

> It's based on 3.18.rc5, where I had couple of EEH fixes on top of it.
> When testing EEH with it, I hit the issue.

>>> With the patch applied, EEH recovery for mlx4 adapter succeeds on PPC
>>> PowerNV platform.
>>>
>>>    # lspci
>>>    0003:0f:00.0 Network controller: Mellanox Technologies \
>>>    MT27500 Family [ConnectX-3]
>>>
>>> Signed-off-by: Gavin Shan <gwshan@...ux.vnet.ibm.com>
>>> ---
>>>  drivers/net/ethernet/mellanox/mlx4/main.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
>>> index 90de6e1..e118ac9 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
>>> @@ -2809,7 +2809,6 @@ static void mlx4_unload_one(struct pci_dev *pdev)
>>>         kfree(dev->caps.qp1_proxy);
>>>         kfree(dev->dev_vfs);
>>>
>>> -       memset(priv, 0, sizeof(*priv));
>>>         priv->pci_dev_data = pci_dev_data;
>>>         priv->removed = 1;
>>>  }
>>> @@ -2900,6 +2899,8 @@ static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev,
>>>                                               pci_channel_state_t state)
>>>  {
>>>         mlx4_unload_one(pdev);
>>> +       pci_release_regions(pdev);
>>> +       pci_disable_device(pdev);
>>>
>>>         return state == pci_channel_io_perm_failure ?
>>>                 PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
>>> --
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ