lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 10 May 2018 17:24:56 +0300
From:   Tariq Toukan <tariqt@...lanox.com>
To:     Zhu Yanjun <yanjun.zhu@...cle.com>, tariqt@...lanox.com,
        netdev@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCHv2 1/1] net/mlx4_core: avoid resetting HCA when accessing
 an offline device



On 18/04/2018 4:31 PM, Zhu Yanjun wrote:
> While a faulty cable is used or HCA firmware error, HCA device will
> be offline. When the driver is accessing this offline device, the
> following call trace will pop out.
> 
> "
> ...
>    [<ffffffff816e4842>] dump_stack+0x63/0x81
>    [<ffffffff816e459e>] panic+0xcc/0x21b
>    [<ffffffffa03e5f8a>] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
>    [<ffffffffa03e7298>] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
>    [<ffffffffa03e7381>] mlx4_cmd_poll+0xc1/0x2e0 [mlx4_core]
>    [<ffffffffa03e9f00>] __mlx4_cmd+0xb0/0x160 [mlx4_core]
>    [<ffffffffa0406934>] mlx4_SENSE_PORT+0x54/0xd0 [mlx4_core]
>    [<ffffffffa03f5f54>] mlx4_dev_cap+0x4a4/0xb50 [mlx4_core]
> ...
> "
> In the above call trace, the function mlx4_cmd_poll calls the function
> mlx4_cmd_post to access the HCA while HCA is offline. Then mlx4_cmd_post
> returns an error -EIO. Per -EIO, the function mlx4_cmd_poll calls
> mlx4_cmd_reset_flow to reset HCA. And the above call trace pops out.
> 
> This is not reasonable. Since HCA device is offline when it is being
> accessed, it should not be reset again.
> 
> In this patch, since HCA is offline, the function mlx4_cmd_post returns
> an error -EINVAL. Per -EINVAL, the function mlx4_cmd_poll directly returns
> instead of resetting HCA.
> 
> CC: Srinivas Eeda <srinivas.eeda@...cle.com>
> CC: Junxiao Bi <junxiao.bi@...cle.com>
> Suggested-by: HÃ¥kon Bugge <haakon.bugge@...cle.com>
> Suggested-by: Tariq Toukan <tariqt@...lanox.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@...cle.com>
> ---
> V1->V2: Follow Tariq's advice, avoid the disturbance from other returned errors.
> Since the returned values from the function mlx4_cmd_post are -EIO and -EINVAL,
> to -EIO, the HCA device should be reset. To -EINVAL, that means that the function
> mlx4_cmd_post is accessing an offline device. It is not necessary to reset HCA.
> Go to label out directly.
> ---
>   drivers/net/ethernet/mellanox/mlx4/cmd.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 

Reviewed-by: Tariq Toukan <tariqt@...lanox.com>

Thanks Zhu.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ