lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Feb 2017 11:39:24 +0100
From:   Paolo Valente <paolo.valente@...aro.org>
To:     Omar Sandoval <osandov@...ndov.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
        linux-block@...r.kernel.org,
        Linux-Kernal <linux-kernel@...r.kernel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Linus Walleij <linus.walleij@...aro.org>, broonie@...nel.org
Subject: Re: [PATCH] bfq-mq: cause deadlock by executing exit_icq body immediately


> Il giorno 08 feb 2017, alle ore 11:33, Omar Sandoval <osandov@...ndov.com> ha scritto:
> 
> On Wed, Feb 08, 2017 at 11:03:01AM +0100, Paolo Valente wrote:
>> 
>>> Il giorno 07 feb 2017, alle ore 22:45, Omar Sandoval <osandov@...ndov.com> ha scritto:
>>> 
>>> On Tue, Feb 07, 2017 at 06:33:46PM +0100, Paolo Valente wrote:
>>>> Hi,
>>>> this patch is meant to show that, if the  body of the hook exit_icq is executed
>>>> from inside that hook, and not as deferred work, then a circular deadlock
>>>> occurs.
>>>> 
>>>> It happens if, on a CPU
>>>> - the body of icq_exit takes the scheduler lock,
>>>> - it does so from inside the exit_icq hook, which is invoked with the queue
>>>> lock held
>>>> 
>>>> while, on another CPU
>>>> - bfq_bio_merge, after taking the scheduler lock, invokes bfq_bic_lookup,
>>>> which, in its turn, takes the queue lock. bfq_bic_lookup needs to take such a
>>>> lock, because it invokes ioc_lookup_icq.
>>>> 
>>>> For more details, here is a lockdep report, right before the deadlock did occur.
>>>> 
>>>> [   44.059877] ======================================================
>>>> [   44.124922] [ INFO: possible circular locking dependency detected ]
>>>> [   44.125795] 4.10.0-rc5-bfq-mq+ #38 Not tainted
>>>> [   44.126414] -------------------------------------------------------
>>>> [   44.127291] sync/2043 is trying to acquire lock:
>>>> [   44.128918]  (&(&bfqd->lock)->rlock){-.-...}, at: [<ffffffff90484195>] bfq_exit_icq_bfqq+0x55/0x140
>>>> [   44.134052]
>>>> [   44.134052] but task is already holding lock:
>>>> [   44.134868]  (&(&q->__queue_lock)->rlock){-.....}, at: [<ffffffff9044738e>] put_io_context_active+0x6e/0xc0
>>> 
>>> Hey, Paolo,
>>> 
>>> I only briefly skimmed the code, but what are you using the queue_lock
>>> for? You should just use your scheduler lock everywhere. blk-mq doesn't
>>> use the queue lock, so the scheduler is the only thing you need mutual
>>> exclusion against.
>> 
>> Hi Omar,
>> the cause of the problem is that the hook functions bfq_request_merge
>> and bfq_allow_bio_merge invoke, directly or through other functions,
>> the function bfq_bic_lookup, which, in its turn, invokes
>> ioc_lookup_icq.  The latter must be invoked with the queue lock held.
>> In particular the offending lines in bfq_bic_lookup are:
>> 
>> 		spin_lock_irqsave(q->queue_lock, flags);
>> 		icq = icq_to_bic(ioc_lookup_icq(ioc, q));
>> 		spin_unlock_irqrestore(q->queue_lock, flags);
>> 
>> Maybe I'm missing something and we can avoid taking this lock?
> 
> Ah, I didn't realize we still used the q->queue_lock for the icq stuff.
> You're right, you still need that lock for ioc_lookup_icq(). Unless
> there's something else I'm forgetting, that should be the only thing you
> need it for in the core code, and you should use your scheduler lock for
> everything else. What else are you using q->queue_lock for? 

Nothing.  The deadlock follows from that bfq_request_merge gets called
with the scheduler lock already held.  Problematic paths start from:
bfq_bio_merge and bfq_insert_request.

I'm trying to understand whether I/we can reorder operations in some
way that avoids the nested locking, but at no avail so far.

Thanks,
Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ