[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96fb83c2-ef02-181c-b27d-bb4c3dbc8194@huawei.com>
Date:   Tue, 22 Nov 2016 16:56:30 +0000
From:   John Garry <john.garry@...wei.com>
To:     Dan Williams <dan.j.williams@...el.com>
CC:     <jejb@...ux.vnet.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        wangyijing <wangyijing@...wei.com>,
        linux-scsi <linux-scsi@...r.kernel.org>,
        John Garry <john.garry2@...l.dcu.ie>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        <linuxarm@...wei.com>, <lindar_liu@...sh.com>,
        Tejun Heo <tj@...nel.org>,
        Jinpu Wang <jinpu.wang@...fitbricks.com>
Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal
On 21/11/2016 17:13, Dan Williams wrote:
> On Mon, Nov 21, 2016 at 7:16 AM, John Garry <john.garry@...wei.com> wrote:
>>>>>> @Maintainers, would you be willing to accept this patch as an interim
>>>>>> fix
>>>>>> for the dastardly WARN while we try to fix the flutter issue?
>>>>>
>>>>>
>>>>>
>>>>> To me this adds a bug to quiet a benign, albeit noisy, warning.
>>>>>
>>>>
>>>> What is the bug which is being added?
>>>
>>>
>>> The bug where we queue a port teardown, but see a port formation event
>>> in the meantime.
>>
>>
>> As I understand, this vulnerability already exists:
>> http://marc.info/?l=linux-scsi&m=143801026028006&w=2
>>
>> I actually don't understand how libsas dealt with flutter (which I take to
>> mean a burst of up and down events) before these changes, as it can only
>> queue simultaneously one up and one down event per port. So, if we get a
>> flutter, then the events are lost and we get indeterminate state.
>>
>
> The events are not lost.
In sas_queue_event(), if there is a particular event pending for a 
port/PHY, we cannot queue further same event types for that port/PHY. I 
think my colleagues found issue where we try to enqueue multiple 
complementary events.
> The new problem this patch introduces is
> delaying sas port deletion where it was previously immediate.  So now
> we can get into a situation where the port has gone down and can start
> processing a port up event before the previous deletion work has run.
>
>>>
>>>> And it's a very noisy warning, as in 6K lines on the console when an
>>>> expander is unplugged.
>>>
>>>
>>> Does something like this modulate the failure?
>
> I'm curious if we simply need to fix the double deletion of the
> sas_port bsg queue, could you try the changes below?
>
No, I just tested it on a root port and we get the same WARN.
>>>
>>> diff --git a/drivers/scsi/scsi_transport_sas.c
>>> b/drivers/scsi/scsi_transport_sas.c            index
>>> 60b651bfaa01..11401e5c88ba 100644
>>>                  --- a/drivers/scsi/scsi_transport_sas.c
>>> +++ b/drivers/scsi/scsi_transport_sas.c
>>> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host
>>> *shost, struct sas_rphy *rphy
>>>  {
>>>         struct request_queue *q;
>>>
>>> -       if (rphy)
>>> +       if (rphy) {
>>>                 q = rphy->q;
>>> -       else
>>> +               rphy->q = NULL;
>>> +       } else
>>>                 q = to_sas_host_attrs(shost)->q;
>>>
>>>         if (!q)
>>>
>>> .
>>>
>>
>>
>
> .
>
Powered by blists - more mailing lists
 
