[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30feb08e-83d0-34e2-06bb-40f4960c8be4@leemhuis.info>
Date: Mon, 4 Jul 2022 14:06:06 +0200
From: Thorsten Leemhuis <regressions@...mhuis.info>
To: Arun Easi <aeasi@...vell.com>,
Tony Battersby <tonyb@...ernetics.com>
Cc: Saurav Kashyap <skashyap@...vell.com>,
Nilesh Javali <njavali@...vell.com>,
GR-QLogic-Storage-Upstream@...vell.com, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after
unplug FC cable
On 23.06.22 01:03, Arun Easi wrote:
> On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:
>
>> On 6/21/22 18:05, Arun Easi wrote:
>>> Thanks for the info. Just to reiterate, you've reported two issues (though
>>> this log was showing only 1 of them).
>>>
>>> Issue 1 - Tape device never disappears when removed
>>> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
>>> was not discovered.
>>>
>>> For Issue-2, please try the attached patch. This may not be the final fix,
>>> but wanted to check if that would fix the issue for you.
>>>
>>> For Issue-1, the behavior was intentional, though that behavior needs
>>> refinement. These tape drives support something called FC sequence level
>>> error recovery (added in FCP-2), which can make tape I/Os survive even
>>> across a short cable pull. This is not a simple retry of the I/O, rather a
>>> retry done at the FC sequence level that gives the IO a better chance of
>>> revival. In other words, the said patch that caused regression, while
>>> introduces an incorrect reporting of the state of the device, makes backup
>>> more resilient.
>>>
>>> Now, onto the behavior when device state is reported immediately. What we
>>> have observed, at least with one tape drive from a major vendor, is that,
>>> across a device loss and device back case with both the events reported to
>>> upper layers, the backup operation was getting failed. This is due to a
>>> REPORT LUNS command being issued during device reappearance reporting
>>> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
>>> and caused the backup to fail.
>>>
>>> I know that some tape drives do not support multiple commands to it at the
>>> same time, but not sure if that is still the norm these days.
>>>
>>> So, perhaps one way to make the behavior better, is to either report the
>>> disappearing device a bit delayed or have intelligence added in SCSI scan
>>> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
>>> Former is a more contained (in the LLD) fix.
>>>
>>> Regards,
>>> -Arun
>>
>> Your patch does fix Issue-2 for me. For Issue-1, it would be fine with
>> me if qla2xxx reported device removal to the upper level a bit delayed,
>> as you said.
>>
>
> Thanks for testing and verifying the patch.
BTW, that patch should have 'Link:' tags pointing to all reports about
this issue, e.g. the start of this thread.
These tags are important, as they allow others to look into the
backstory now and years from now. That is why they should be placed in
cases like this, as Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst explain in more detail.
Additionally, my regression tracking bot ‘regzbot’ relies on these tags
to automatically connect reports with patches that are posted or
committed to fix the reported issue. BTW, let me tell regzbot to monitor
this thread:
> We will post the patch upstream after due testing.
That was more than two weeks ago now and I didn't see any progress. Or
did I miss it?
Reminder, things should take this long. For details see the section
"Prioritize work on fixing regressions" in this document:
https://docs.kernel.org/process/handling-regressions.html
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
#regzbot poke
Powered by blists - more mailing lists