lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 14 Apr 2021 11:33:14 -0700
From:   Khazhy Kumykov <khazhy@...gle.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Paolo Valente <paolo.valente@...aro.org>,
        Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] bfq: silence lockdep for bfqd/ioc lock inversion

On Wed, Apr 14, 2021 at 2:54 AM Jan Kara <jack@...e.cz> wrote:
>
> On Thu 18-03-21 23:00:15, Khazhismel Kumykov wrote:
> > lockdep warns of circular locking due to inversion between
> > bfq_insert_requests and bfq_exit_icq. If we end freeing a request when
> > merging, we *may* grab an ioc->lock if that request is the last refcount
> > to that ioc. bfq_bio_merge also potentially could have this ordering.
> > bfq_exit_icq, conversely, grabs bfqd but is always called with ioc->lock
> > held.
> >
> > bfq_exit_icq may either be called from put_io_context_active with ioc
> > refcount raised, ioc_release_fn after the last refcount was already
> > dropped, or ioc_clear_queue, which is only called while queue is
> > quiesced or exiting, so the inverted orderings should never conflict.
> >
> > Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as
> > an extra scheduler")
> >
> > Signed-off-by: Khazhismel Kumykov <khazhy@...gle.com>
>
> I've just hit the same lockdep complaint. When looking at this another
> option to solve this complaint seemed to be to modify bfq_bio_merge() like:
>
>         ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
>
> +       spin_unlock_irq(&bfqd->lock);
>         if (free)
>                 blk_mq_free_request(free);
> -       spin_unlock_irq(&bfqd->lock);
>
>         return ret;
>
> to release request outside of bfqd->lock. Because AFAICT there's no good
> reason why we are actually freeing the request under bfqd->lock. And it
> would seem a bit safer than annotating-away the lockdep complaint (as much
> as I don't see a problem with your analysis). Paolo?

If we can re-order the locking so we don't need the annotation, that
seems better ("inversion is OK so long as either we're frozen or we
have ioc refcount, and we only grab ioc->lock normally if we drop the
last refcount" is a tad "clever"). Though we still need to deal with
blk_mq_sched_try_insert_merge which can potentially free a request.
(See the first stacktrace). Something simple that I wasn't sure of is:
could we delay bfq_exit_icq work, then avoid the inversion? Simpler to
analyze then.

Khazhy

>
>                                                                 Honza
> > ---
> >  block/bfq-iosched.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > Noticed this lockdep running xfstests (generic/464) on top of a bfq
> > block device. I was also able to tease it out w/ binary trying to issue
> > requests that would end up merging while rapidly swapping the active
> > scheduler. As far as I could see, the deadlock would not actually occur,
> > so this patch opts to change lock class for the inverted case.
> >
> > bfqd -> ioc :
> > [ 2995.524557] __lock_acquire+0x18f5/0x2660
> > [ 2995.524562] lock_acquire+0xb4/0x3a0
> > [ 2995.524565] _raw_spin_lock_irqsave+0x3f/0x60
> > [ 2995.524569] put_io_context+0x33/0x90.  -> ioc->lock grabbed
> > [ 2995.524573] blk_mq_free_request+0x51/0x140
> > [ 2995.524577] blk_put_request+0xe/0x10
> > [ 2995.524580] blk_attempt_req_merge+0x1d/0x30
> > [ 2995.524585] elv_attempt_insert_merge+0x56/0xa0
> > [ 2995.524590] blk_mq_sched_try_insert_merge+0x4b/0x60
> > [ 2995.524595] bfq_insert_requests+0x9e/0x18c0.    -> bfqd->lock grabbed
> > [ 2995.524598] blk_mq_sched_insert_requests+0xd6/0x2b0
> > [ 2995.524602] blk_mq_flush_plug_list+0x154/0x280
> > [ 2995.524606] blk_finish_plug+0x40/0x60
> > [ 2995.524609] ext4_writepages+0x696/0x1320
> > [ 2995.524614] do_writepages+0x1c/0x80
> > [ 2995.524621] __filemap_fdatawrite_range+0xd7/0x120
> > [ 2995.524625] sync_file_range+0xac/0xf0
> > [ 2995.524642] __x64_sys_sync_file_range+0x44/0x70
> > [ 2995.524646] do_syscall_64+0x31/0x40
> > [ 2995.524649] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > ioc -> bfqd
> > [ 2995.524490] _raw_spin_lock_irqsave+0x3f/0x60
> > [ 2995.524498] bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
> > [ 2995.524512] put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
> > [ 2995.524516] exit_io_context+0x48/0x50
> > [ 2995.524519] do_exit+0x7e9/0xdd0
> > [ 2995.524526] do_group_exit+0x54/0xc0
> > [ 2995.524530] __x64_sys_exit_group+0x18/0x20
> > [ 2995.524534] do_syscall_64+0x31/0x40
> > [ 2995.524537] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Another trace where we grab ioc -> bfqd through bfq_exit_icq is when
> > changing elevator
> >                -> #1 (&(&bfqd->lock)->rlock){-.-.}:
> > [  646.890820]        lock_acquire+0x9b/0x140
> > [  646.894868]        _raw_spin_lock_irqsave+0x3b/0x50
> > [  646.899707]        bfq_exit_icq_bfqq+0x47/0x1f0
> > [  646.904196]        bfq_exit_icq+0x21/0x30
> > [  646.908160]        ioc_destroy_icq+0xf3/0x130
> > [  646.912466]        ioc_clear_queue+0xb8/0x140
> > [  646.916771]        elevator_switch_mq+0xa4/0x3c0
> > [  646.921333]        elevator_switch+0x5f/0x340
> >
> > diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> > index 95586137194e..cb50ac0ffe80 100644
> > --- a/block/bfq-iosched.c
> > +++ b/block/bfq-iosched.c
> > @@ -5027,7 +5027,14 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
> >       if (bfqq && bfqd) {
> >               unsigned long flags;
> >
> > -             spin_lock_irqsave(&bfqd->lock, flags);
> > +             /* bfq_exit_icq is usually called with ioc->lock held, which is
> > +              * inverse order from elsewhere, which may grab ioc->lock
> > +              * under bfqd->lock if we merge requests and drop the last ioc
> > +              * refcount. Since exit_icq is either called with a refcount,
> > +              * or with queue quiesced, use a differnet lock class to
> > +              * silence lockdep
> > +              */
> > +             spin_lock_irqsave_nested(&bfqd->lock, flags, 1);
> >               bfqq->bic = NULL;
> >               bfq_exit_bfqq(bfqd, bfqq);
> >               bic_set_bfqq(bic, NULL, is_sync);
> > --
> > 2.31.0.rc2.261.g7f71774620-goog
> >
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Download attachment "smime.p7s" of type "application/pkcs7-signature" (3996 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ