lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 27 Aug 2013 15:13:50 +0200
From:	Douglas Gilbert <dgilbert@...erlog.com>
To:	vaughan <vaughan.cao@...cle.com>
CC:	Jörn Engel <joern@...fs.org>,
	JBottomley@...allels.com, linux-scsi@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/4] [SCSI] sg: fix race condition in sg_open

On 13-08-27 10:16 AM, vaughan wrote:
> On 08/13/2013 11:16 AM, Douglas Gilbert wrote:
>> On 13-08-12 10:46 PM, vaughan wrote:
>>> On 08/06/2013 04:52 AM, Douglas Gilbert wrote:
>>>> On 13-08-04 10:19 PM, vaughan wrote:
>>>>> On 08/03/2013 01:25 PM, Douglas Gilbert wrote:
>>>>>> On 13-08-01 01:01 AM, Douglas Gilbert wrote:
>>>>>>> On 13-07-22 01:03 PM, Jörn Engel wrote:
>>>>>>>> On Mon, 22 July 2013 12:40:29 +0800, Vaughan Cao wrote:
>>>>>>>>>
>>>>>>>>> There is a race when open sg with O_EXCL flag. Also a race may
>>>>>>>>> happen between
>>>>>>>>> sg_open and sg_remove.
>>>>>>>>>
>>>>>>>>> Changes from v4:
>>>>>>>>>      * [3/4] use ERR_PTR series instead of adding another
>>>>>>>>> parameter in
>>>>>>>>> sg_add_sfp
>>>>>>>>>      * [4/4] fix conflict for cherry-pick from v3.
>>>>>>>>>
>>>>>>>>> Changes from v3:
>>>>>>>>>      * release o_sem in sg_release(), not in sg_remove_sfp().
>>>>>>>>>      * not set exclude with sfd_lock held.
>>>>>>>>>
>>>>>>>>> Vaughan Cao (4):
>>>>>>>>>       [SCSI] sg: use rwsem to solve race during exclusive open
>>>>>>>>>       [SCSI] sg: no need sg_open_exclusive_lock
>>>>>>>>>       [SCSI] sg: checking sdp->detached isn't protected when open
>>>>>>>>>       [SCSI] sg: push file descriptor list locking down to
>>>>>>>>> per-device
>>>>>>>>>         locking
>>>>>>>>>
>>>>>>>>>      drivers/scsi/sg.c | 178
>>>>>>>>> +++++++++++++++++++++++++-----------------------------
>>>>>>>>>      1 file changed, 83 insertions(+), 95 deletions(-)
>>>>>>>>
>>>>>>>> Patchset looks good to me, although I didn't test it on hardware
>>>>>>>> yet.
>>>>>>>> Signed-off-by: Joern Engel <joern@...fs.org>
>>>>>>>>
>>>>>>>> James, care to pick this up?
>>>>>>>
>>>>>>> Acked-by: Douglas Gilbert <dgilbert@...erlog.com>
>>>>>>>
>>>>>>> Tested O_EXCL with multiple processes and threads; passed.
>>>>>>> sg driver prior to this patch had "leaky" O_EXCL logic
>>>>>>> according to the same test. Block device passed.
>>>>>>>
>>>>>>> James, could you clean this up:
>>>>>>>       drivers/scsi/sg.c:242:6: warning: unused variable ‘res’
>>>>>>> [-Wunused-variable]
>>>>>>
>>>>>> Further testing suggests this patch on the sg driver is
>>>>>> broken, so I'll rescind my ack.
>>>>>>
>>>>>> The case it is broken for is when a device is opened
>>>>>> without O_EXCL. Now if, while it is open, a second
>>>>>> thread/process tries to open the same device O_EXCL
>>>>>> then IMO the second open should fail with EBUSY.
>>>>>>
>>>>>> My testing shows that O_EXCL opens properly deflect
>>>>>> other O_EXCL opens.
>>>>> Hi  Doug,
>>>>>
>>>>> My test don't have this issue. The routine is something as below:
>>>>>
>>>>> I start three opens without O_EXCL, wait 30s each, and open with
>>>>> O_EXCL|O_NONBLOCK, it failed with EBUSY.
>>>>> And I also call myopen with/without O_EXCL many times in background at
>>>>> the same time, and the test is passed. I don't know why it failed in
>>>>> your test.
>>>>>
>>>>> Usage: myopen [-e][-n][-d delay] -f file
>>>>>          -e: exclude
>>>>>          -n: nonblock
>>>>>          -d: delay N seconds and then close.
>>>>>
>>>>> [root@...aowol5 16835013]# ./myopen  -f /dev/sg5 -d 30 &
>>>>> [1] 3417
>>>>> [root@...aowol5 16835013]# ./myopen  -f /dev/sg5 -d 30 &
>>>>> [2] 3418
>>>>> [root@...aowol5 16835013]# ./myopen  -f /dev/sg5 -d 30 &
>>>>> [3] 3419
>>>>> [root@...aowol5 16835013]# cat /proc/scsi/sg/debug
>>>>> max_active_device=6(origin 1)
>>>>>     def_reserved_size=32768
>>>>>     >>> device=sg5 scsi5 chan=0 id=1 lun=0   em=0 sg_tablesize=55
>>>>> excl=0
>>>>>       FD(1): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>>       FD(2): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>>       FD(3): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>>
>>>>> [root@...aowol5 16835013]# ./myopen -e -n  -f /dev/sg5 -d 30 &
>>>>> [4] 3422
>>>>> [3422:3351] /dev/sg5:exclude: Device or resource busy
>>>>>
>>>>> [4]+  Exit 1                  ./myopen -e -n -f /dev/sg5 -d 30
>>>>>
>>>>> [root@...aowol5 16835013]# cat /proc/scsi/sg/debug
>>>>> max_active_device=6(origin 1)
>>>>>     def_reserved_size=32768
>>>>>     >>> device=sg5 scsi5 chan=0 id=1 lun=0   em=0 sg_tablesize=55
>>>>> excl=0
>>>>>       FD(1): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>>       FD(2): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>>       FD(3): timeout=60000ms bufflen=32768 (res)sgat=1 low_dma=0
>>>>>       cmd_q=0 f_packid=0 k_orphan=0 closed=0
>>>>>         No requests active
>>>>> [root@...aowol5 16835013]# cat /proc/scsi/sg/debug
>>>>> [1]   Done                    ./myopen -f /dev/sg5 -d 30
>>>>> [2]-  Done                    ./myopen -f /dev/sg5 -d 30
>>>>> [3]+  Done                    ./myopen -f /dev/sg5 -d 30
>>>>>
>>>>
>>>> Hi,
>>>> After the initial failures about 36 hours ago, retesting
>>>> yesterday and today has not produced any unexpected
>>>> failures. And I have been trying hard on lk 3.10.4 and
>>>> lk 3.10.5 .
>>>>
>>>> My test program is a bit more intense than yours and can
>>>> be found in the sg3_utils beta in the News section of this
>>>> page:
>>>>     http://sg.danny.cz/sg/
>>>>
>>>> It is in the examples directory, two variants called
>>>> sg_tst_excl and sg_tst_excl2 . You will need a recent gcc
>>>> compiler, IOW something that can compile c++11 . gcc 4.7.3
>>>> in Ubuntu 13.04 only just manages, fedora 19 should do
>>>> better with gcc 4.8.1 . The threading is implemented using
>>>> pthreads so it should be reliable.
>>>>
>>>> Typically I run multiple instances (processes) and each has
>>>> multiple threads. One instance can run '-x' which will cause
>>>> its first thread not to use O_EXCL **. All my tests currently
>>>> use O_NONBLOCK and that leads to lots of EBUSYs (sometimes
>>>> in the billions).
>>>>
>>>> Doug Gilbert
>>>>
>>>>
>>>> ** Using '-x' on two instances will cause an expected failure
>>>>      so can be used as a control.
>>>>
>>> Hi Doug,
>>>
>>> Can I regard this as you ACK it again?
>>
>> Hi,
>> I'd like you to test your setup with sg_tst_excl or sg_tst_excl2 .
>> Since my last email, I have not seen any more failures with those
>> tests on the patched sg driver but I did see a couple on
>> /dev/sd* . With sg_tst_excl2, bsg devices can be used and since bsg
>> accepts and ignores O_EXCL, it fails reliably.
>>
>> BTW I use scsi_debug with 'delay=0' for a pseudo device.
>>
>> Doug Gilbert
> Hi Doug,
>
> I run test for sg and sd drivers with both sg_tst_excl and sg_tst_excl2
> on kernel vanilla 3.10.9 with mypatches included on Fedora19 x86_64
> baremachine.
> * I've tried several times with different -w and -n setting, no failure
> for sg driver found.
> * It's easy to find failure on both patched and non-patched kernel for
> sd driver with the following test command:
>      ./sg_tst_excl -w 3 -n 2000 -t 16 -x $1 &
>      ./sg_tst_excl -w 3 -n 2000 -t 16 $1 &
>      ./sg_tst_excl -w 0 -n 2000 -t 16 $1 &
>      ./sg_tst_excl -w -1 -n 2000 -t 16 $1 &
>      ./sg_tst_excl -w -2 -n 2000 -t 16 $1 &
> I think option '-w 0/-1/-2' is significant to trigger the failure, since
> when I only use '-w >0', test usually passed.

Hi,
Thanks for testing that. For others trying to follow this,
the usage message for that test utility is:

Usage: sg_tst_excl [-b] [-l <lba>] [-n <n_per_thr>] [-t <num_thrs>]
                    [-V] [-w <wait_ms>] [-x] <disk_device>
   where
     -b                block on open (def: O_NONBLOCK)
     -l <lba>          logical block to increment (def: 1000)
     -n <n_per_thr>    number of loops per thread (def: 200)
     -t <num_thrs>     number of threads (def: 4)
     -V                print version number then exit
     -w <wait_ms>      >0: sleep_for(<wait_ms>); =0: yield(); -1: no
                       wait; -2: sleep(0)  (def: 0)
     -x                don't use O_EXCL on first thread (def: use
                       O_EXCL on all threads)

Test O_EXCL open flag with sg driver. Each open/close cycle with the
O_EXCL flag does a double increment on lba (using its first 4 bytes).

----------------------

The test is using O_EXCL as a lock so that if a 4 byte integer in
a block starts out as even (and for a scsi_debug pseudo, blocks are
zero filled) then a open,double_increment,close sequence should
always see an even number after the open. Notice that it should be
safe for one process to run with '-x' so that its first thread
opens the device without the O_EXCL flag.


Acked-by: Douglas Gilbert <dgilbert@...erlog.com>

And I let you run with the sd driver quirk :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ