[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <526555BF.90909@interlog.com>
Date: Mon, 21 Oct 2013 12:26:39 -0400
From: Douglas Gilbert <dgilbert@...erlog.com>
To: Vaughan Cao <vaughan.cao@...cle.com>,
Bart Van Assche <bvanassche@....org>,
SCSI development list <linux-scsi@...r.kernel.org>,
Madper Xie <cxie@...hat.com>,
James Bottomley <james.bottomley@...senpartnership.com>
CC: linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] [SCSI] sg: late O_EXCL fix for lk 3.12-rc
On 13-10-21 06:35 AM, Vaughan Cao wrote:
>
> On 2013年10月21日 07:00, Douglas Gilbert wrote:
>> On 13-10-20 01:31 PM, Bart Van Assche wrote:
>>> On 10/20/13 18:09, Douglas Gilbert wrote:
>>>> Given that lk 3.12.0 release is not far away, the safest path
>>>> may still be to revert Vaughan Cao's patch. I'll leave that
>>>> decision to the maintainers.
>>>
>>> Hello Doug,
>>>
>>> Thanks for looking into this. But I would appreciate it if you could address the
>>> whitespace errors reported by checkpatch:
<snip>
>> I'd prefer people to test the patch or find logical flaws.
>>
>> Doug Gilbert
>>
> Hi Doug,
>
> Will the lines below conflict with the meaning of NONBLOCK?
> >+ down(&sdp->or_sem);
> >+ alone = sfds_list_empty(sdp);
> > if (!((flags & O_NONBLOCK) ||
> > scsi_block_when_processing_errors(sdp->device))) {
>
> Assume one thread holds the or_sem and waiting in
> scsi_block_when_processing_errors for the underlying scsi device to complete
> error recovery,
> another thread with O_NONBLOCK call sg_open().
Indeed. New (replacement) patch attached, moving the
down() below that check.
> I'm also curious why we can skip checking _processing_errors() when called with
> O_NONBLOCK?...Though it has been there from the very beginning.
.. very beginning?? The world did not start with git.
If memory serves: blame->jeb for that logic. And it
does a non-interruptible wait_event(), grrr. 'git
blame' is still interesting, I notice even the LWN
editor has visited.
> In other words, since scsi device may go into a error recovery state at random
> time, why we only check here?
There are scsi_block_when_processing_errors() calls
sprinkled all over the place. Too many is never enough.
And why are they bypassed in the sg driver when
O_NOBLOCK is given? Because clearing the error condition
on the device may need a SCSI command injected from the
user space. Pass-through, pass-by ...
Testing in this area can be real fun:
- generate a stream of SCSI command sequences, fired
from multiple threads/processes and vary the open
O_NONBLOCK and O_EXCL flags
- fire off signals to those processes and threads at
random times
- detach the underlying device(s) at random times
- and set the "processing_errors" state on the device
for variable amounts of time, once in a while
For bonus points make those SCSI devices send back
interesting UNIT ATTENTIONs from time to time and/or
trigger timeouts by not reponding in time.
This is a proposed patch over Vaughan Cao's patch currently
in the lk 3.12.0-rc series. It replaces his per-device
read-write semaphore with:
- a normal semaphore (or_sem) to stop co-incident open()s
and release()s on the same device treading on the same
data.
- a wait_queue (open_wait) to handle wait on open() which is
associated with the use of O_EXCL (when O_NONBLOCK is not
given). The two wait_event_interruptible() calls are in a
loop because multiple open()s could be waiting on either
case. All get woken up but only one will get the down() at
the bottom of the loop and proceed to complete the open(),
potentially leaving the other candidates to continue waiting
until it releases.
Signed-off-by: Douglas Gilbert <dgilbert@...erlog.com>
View attachment "sg_7vc_dg2.diff" of type "text/x-patch" (4976 bytes)
Powered by blists - more mailing lists