linux-kernel - Re: [PATCH 3/5] blktrace: refcount the request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200416011247.GB11244@42.do-not-panic.com>
Date:   Thu, 16 Apr 2020 01:12:47 +0000
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     Christoph Hellwig <hch@...radead.org>, axboe@...nel.dk,
        viro@...iv.linux.org.uk, gregkh@...uxfoundation.org,
        rostedt@...dmis.org, mingo@...hat.com, jack@...e.cz,
        ming.lei@...hat.com, nstange@...e.de, akpm@...ux-foundation.org,
        mhocko@...e.com, yukuai3@...wei.com, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Omar Sandoval <osandov@...com>,
        Hannes Reinecke <hare@...e.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: Re: [PATCH 3/5] blktrace: refcount the request_queue during ioctl

On Wed, Apr 15, 2020 at 07:18:22AM -0700, Bart Van Assche wrote:
> On 2020-04-15 05:34, Luis Chamberlain wrote:
> > On Wed, Apr 15, 2020 at 12:14:25AM -0700, Christoph Hellwig wrote:
> >> Btw, Isn't blk_get_queue racy as well?  Shouldn't we check
> >> blk_queue_dying after getting the reference and undo it if the queue is
> >> indeeed dying?
> > 
> > Yes that race should be possible:
> > 
> > bool blk_get_queue(struct request_queue *q)                                     
> > {                                                                               
> > 	if (likely(!blk_queue_dying(q))) {
> >        ----------> we can get the queue to go dying here <---------
> > 		__blk_get_queue(q);
> > 		return true;
> > 	}                                                                       
> > 
> > 	return false;
> > }                                                                               
> > EXPORT_SYMBOL(blk_get_queue);
> > 
> > I'll pile up a fix. I've also considered doing a full review of callers
> > outside of the core block layer using it, and maybe just unexporting
> > this. It was originally exported due to commit d86e0e83b ("block: export
> > blk_{get,put}_queue()") to fix a scsi bug, but I can't find such
> > respective fix. I suspec that using bdgrab()/bdput() seems more likely
> > what drivers should be using. That would allow us to keep this
> > functionality internal.
> 
> blk_get_queue() prevents concurrent freeing of struct request_queue but
> does not prevent concurrent blk_cleanup_queue() calls.

Wouldn't concurrent blk_cleanup_queue() calls be a bug? If so should
I make it clear that it would be or simply prevent it?

> Callers of
> blk_get_queue() may encounter a change of the queue state from normal
> into dying any time during the blk_get_queue() call or after
> blk_get_queue() has finished. Maybe I'm overlooking something but I
> doubt that modifying blk_get_queue() will help.

Good point, to fix that race described by Christoph we'd have to take
into consideration refcounts of the request_queue to prevent queues from
changing state to dying if the refcount is > 1, however that'd also
would  mean not allowing the request_queue from ever dying.

One way we could resolve this could be to to keep track of a
quiesce/dying request, then at that point prevent blk_get_queue() from
allowing increments, and once the refcount is down to 1, flip the switch
to dying.

  Luis