lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Apr 2020 07:45:18 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     Luis Chamberlain <mcgrof@...nel.org>,
        Christoph Hellwig <hch@...radead.org>
Cc:     axboe@...nel.dk, viro@...iv.linux.org.uk,
        gregkh@...uxfoundation.org, rostedt@...dmis.org, mingo@...hat.com,
        jack@...e.cz, ming.lei@...hat.com, nstange@...e.de,
        akpm@...ux-foundation.org, mhocko@...e.com, yukuai3@...wei.com,
        linux-block@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Omar Sandoval <osandov@...com>,
        Hannes Reinecke <hare@...e.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl

On 2020-04-14 23:16, Luis Chamberlain wrote:
> On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
>> Hmm, where exactly does the race come in so that it can only happen
>> after where you take the reference, but not before it?  I'm probably
>> missing something, but that just means it needs to be explained a little
>> better :)
> 
>>>From the trace on patch 2/5:
> 
>     BLKTRACE_SETUP(loop0) #2
>     [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
>     [   13.936758] === do_blk_trace_setup(2) start
>     [   13.938944] === do_blk_trace_setup(2) creating directory
>     [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
>     
>     ---> From LOOP_CTL_DEL(loop0) #2
>     [   13.971046] === blk_trace_cleanup(7) end
>     [   13.973175] == __blk_trace_remove(7) end
>     [   13.975352] == blk_trace_shutdown(7) end
>     [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
>     [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
>     [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
>     [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
>     [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
>     [   13.993155] = __blk_release_queue(7) end
>     
>     ---> From BLKTRACE_SETUP(loop0) #2
>     [   13.995928] === do_blk_trace_setup(2) end with ret: 0
>     [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
> 
> The BLKTRACESETUP above works on request_queue which later
> LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> If you use this commit alone though, this doesn't fix the race issue
> however, and that's because of both still the debugfs_lookup() use
> and that we're still using asynchronous removal at this point.
> 
> refcounting will just ensure we don't take the request_queue underneath
> our noses.

I think the above trace reveals a bug in the loop driver. The loop
driver shouldn't allow the associated request queue to disappear while
the loop device is open. One may want to have a look at sd_open() in the
sd driver. The scsi_disk_get() call in that function not only increases
the reference count of the SCSI disk but also of the underlying SCSI device.

Thanks,

Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ