lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C3D3BD1.4000508@huawei.com>
Date:   Tue, 15 Jan 2019 09:48:01 +0800
From:   "Wei Hu (Xavier)" <xavier.huwei@...wei.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
CC:     <dledford@...hat.com>, <linux-rdma@...r.kernel.org>,
        <lijun_nudt@....com>, <oulijun@...wei.com>,
        <liudongdong3@...wei.com>, <liuyixian@...wei.com>,
        <zhangxiping3@...wei.com>, <linuxarm@...wei.com>,
        <linux-kernel@...r.kernel.org>, <xavier_huwei@....com>
Subject: Re: [PATCH rdma-rc 1/3] RDMA/hns: Fix the Oops during rmmod or insmod
 ko when reset occurs



On 2019/1/15 6:06, Jason Gunthorpe wrote:
> On Sat, Jan 12, 2019 at 03:55:31PM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2019/1/12 5:34, Jason Gunthorpe wrote:
>>> On Thu, Jan 10, 2019 at 09:57:41PM +0800, Wei Hu (Xavier) wrote:
>>>> +	/* Check the status of the current software reset process, if in
>>>> +	 * software reset process, wait until software reset process finished,
>>>> +	 * in order to ensure that reset process and this function will not call
>>>> +	 * __hns_roce_hw_v2_uninit_instance at the same time.
>>>> +	 * If a timeout occurs, it indicates that the network subsystem has
>>>> +	 * encountered a serious error and cannot be recovered from the reset
>>>> +	 * processing.
>>>> +	 */
>>>> +	if (ops->ae_dev_resetting(handle)) {
>>>> +		dev_warn(dev, "Device is busy in resetting state. waiting.\n");
>>>> +		end = msecs_to_jiffies(HNS_ROCE_V2_RST_PRC_MAX_TIME) + jiffies;
>>>> +		while (ops->ae_dev_resetting(handle) &&
>>>> +		       time_before(jiffies, end))
>>>> +			msleep(20);
>>> Really? Does this have to be so ugly? Why isn't there just a simple
>>> lock someplace that is held during reset?
>>>
>>> I'm skeptical that all this strange looking stuff is properly locked
>>> and concurrency safe.
>> Hi, Jason
>>
>> The hns3 NIC driver notifies the hns RoCE driver to perform
>> reset related processing by calling the .reset_notify() interface
>> registered by the RoCE driver.
>>
>> There is a constraint on the hip08 chip, the NIC driver needs to
>> stop the flow before hardware startup reset, otherwise the chip
>> may hang up.
>>
>> We've also thought about using locks, but found using locks can
>> lead to more serious problems because of that restriction of the
>> chip.
>> If using locks here, reset processing may wait for uninstallation
>> to complete, this may lead that NIC driver fails to stop the flow
>> in time in the reset process, thus causing the chip to hang up.
> If you are sleeping then I'm sure a lock can be used instead, how
> would it be any different?
Hi, Jason
    If using locks here, reset process may wait until uninstallation to
complete,
        it may trigger the chip constraint, causing chip to hang up.
    But if using sleeping here, there will notthe case that reset
process wait until
       uninstallation to complete, then will not trigger the chip
constraint.     
    Thanks

    reset process
            lock
            ....
            unlock

    uninitallation
           lock
            ...
           unlock


    Regards
Xavier
> Jason
>
> .
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ