linux-kernel - Re: [PATCH 3/5] blktrace: refcount the request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200416011702.GC11244@42.do-not-panic.com>
Date:   Thu, 16 Apr 2020 01:17:02 +0000
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     Christoph Hellwig <hch@...radead.org>, axboe@...nel.dk,
        viro@...iv.linux.org.uk, gregkh@...uxfoundation.org,
        rostedt@...dmis.org, mingo@...hat.com, jack@...e.cz,
        ming.lei@...hat.com, nstange@...e.de, akpm@...ux-foundation.org,
        mhocko@...e.com, yukuai3@...wei.com, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Omar Sandoval <osandov@...com>,
        Hannes Reinecke <hare@...e.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl

On Wed, Apr 15, 2020 at 07:45:18AM -0700, Bart Van Assche wrote:
> On 2020-04-14 23:16, Luis Chamberlain wrote:
> > On Tue, Apr 14, 2020 at 08:40:44AM -0700, Christoph Hellwig wrote:
> >> Hmm, where exactly does the race come in so that it can only happen
> >> after where you take the reference, but not before it?  I'm probably
> >> missing something, but that just means it needs to be explained a little
> >> better :)
> > 
> >>From the trace on patch 2/5:
> > 
> >     BLKTRACE_SETUP(loop0) #2
> >     [   13.933961] == blk_trace_ioctl(2, BLKTRACESETUP) start
> >     [   13.936758] === do_blk_trace_setup(2) start
> >     [   13.938944] === do_blk_trace_setup(2) creating directory
> >     [   13.941029] === do_blk_trace_setup(2) using what debugfs_lookup() gave
> >     
> >     ---> From LOOP_CTL_DEL(loop0) #2
> >     [   13.971046] === blk_trace_cleanup(7) end
> >     [   13.973175] == __blk_trace_remove(7) end
> >     [   13.975352] == blk_trace_shutdown(7) end
> >     [   13.977415] = __blk_release_queue(7) calling blk_mq_debugfs_unregister()
> >     [   13.980645] ==== blk_mq_debugfs_unregister(7) begin
> >     [   13.980696] ==== blk_mq_debugfs_unregister(7) debugfs_remove_recursive(q->debugfs_dir)
> >     [   13.983118] ==== blk_mq_debugfs_unregister(7) end q->debugfs_dir is NULL
> >     [   13.986945] = __blk_release_queue(7) blk_mq_debugfs_unregister() end
> >     [   13.993155] = __blk_release_queue(7) end
> >     
> >     ---> From BLKTRACE_SETUP(loop0) #2
> >     [   13.995928] === do_blk_trace_setup(2) end with ret: 0
> >     [   13.997623] == blk_trace_ioctl(2, BLKTRACESETUP) end
> > 
> > The BLKTRACESETUP above works on request_queue which later
> > LOOP_CTL_DEL races on and sweeps the debugfs dir underneath us.
> > If you use this commit alone though, this doesn't fix the race issue
> > however, and that's because of both still the debugfs_lookup() use
> > and that we're still using asynchronous removal at this point.
> > 
> > refcounting will just ensure we don't take the request_queue underneath
> > our noses.
> 
> I think the above trace reveals a bug in the loop driver. The loop
> driver shouldn't allow the associated request queue to disappear while
> the loop device is open.

The bug was *not* in the driver, the bug was in that deferal of removal
was allowed to be asynchronous, therefore the removal from a userspace
perspective *finishes*, but its not actually really done. Back when
the removal was synchronous, the loop driver waited on cleanup, and
didn't return to userspace until it was really removed.

This is why I annotated that the move to asynch removal turns out to
actually be a userspace API regression.

> One may want to have a look at sd_open() in the
> sd driver. The scsi_disk_get() call in that function not only increases
> the reference count of the SCSI disk but also of the underlying SCSI device.

Are you saying to use this as a template for what a driver should do or
do you suspect there is a bug there? Not sure what you mean here.

  Luis