linux-kernel - Re: [PATCH 7/8] mq-deadline: add blk-mq adaptation of the deadline IO scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <124E4BA6-E401-4F3D-9EF4-B99681B62038@linaro.org>
Date:   Tue, 7 Feb 2017 18:27:20 +0100
From:   Paolo Valente <paolo.valente@...aro.org>
To:     Jens Axboe <axboe@...com>
Cc:     Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
        Linux-Kernal <linux-kernel@...r.kernel.org>, osandov@...com
Subject: Re: [PATCH 7/8] mq-deadline: add blk-mq adaptation of the deadline IO scheduler


> Il giorno 02 feb 2017, alle ore 22:32, Jens Axboe <axboe@...com> ha scritto:
> 
> On 02/02/2017 02:15 PM, Paolo Valente wrote:
>> 
>>> Il giorno 02 feb 2017, alle ore 16:30, Jens Axboe <axboe@...com> ha scritto:
>>> 
>>> On 02/02/2017 02:19 AM, Paolo Valente wrote:
>>>> The scheme is clear.  One comment, in case it could make sense and
>>>> avoid more complexity: since put_rq_priv is invoked in two different
>>>> contexts, process or interrupt, I didn't feel so confusing that, when
>>>> put_rq_priv is invoked in the context where the lock cannot be held
>>>> (unless one is willing to pay with irq disabling all the times), the
>>>> lock is not held, while, when invoked in the context where the lock
>>>> can be held, the lock is actually held, or must be taken.
>>> 
>>> If you grab the same lock from put_rq_priv, yes, you must make it IRQ
>>> disabling in all contexts, and use _irqsave() from put_rq_priv. If it's
>>> just freeing resources, you could potentially wait and do that when
>>> someone else needs them, since that part will come from proces context.
>>> That would need two locks, though.
>>> 
>>> As I said above, I would not worry about the IRQ disabling lock.
>>> 
>> 
>> I'm sorry, I focused only on the IRQ-disabling consequence of grabbing
>> a scheduler lock also in IRQ context.  I thought it was a serious
>> enough issue to avoid this option.  Yet there is also a deadlock
>> problem related to this option.  In fact, the IRQ handler may preempt
>> some process-context code that already holds some other locks, and, if
>> some of these locks are already held by another process, which is
>> executing on another CPU and which then tries to take the scheduler
>> lock, or which happens to be preempted by an IRQ handler trying to
>> grab the scheduler lock, then a deadlock occurs.  This is not just a
>> speculation, but a problem that did occur before I moved to a
>> deferred-work solution, and that can be readily reproduced.  Before
>> moving to a deferred work solution, I tried various code manipulations
>> to avoid these deadlocks without resorting to deferred work, but at no
>> avail.
> 
> There are two important rules here:
> 
> 1) If a lock is ever used in interrupt context, anyone acquiring it must
>   ensure that interrupts gets disabled.
> 
> 2) If multiple locks are needed, they need to be acquired in the right
>   order.
> 
> Instead of talking in hypotheticals, be more specific. With the latest
> code, the scheduler lock should now be fine, there should be no cases
> where you are being invoked with it held. I'm assuming you are running
> with lockdep enabled on your kernel? Post the stack traces from your
> problem (and your code...), then we can take a look.
> 

Hi Jens,

your last change (freeing requests outside merges) did remove two of
out three deadlock scenarios for which I turned some handlers into
deferred work items in bfq-mq.  For the remaining one, I'm about to
send a separate email, with the description of the deadlock, together
with the patch that, applied on top of the bfq-mq branch, causes the
deadlock by turning moving back the body of exit_icq from a deferred
work to the exit_icq hook itself.  And, yes, as I'll write below, I'm
finally about to share a branch containing bfq-mq.

> Don't punt to deferring work from your put_rq_private() function, that's
> a suboptimal work around. It needs to be fixed for real.
> 

Yeah, sub-optimal also in terms of poor developer time: I spent a lot
of time letting that deferred work, and hopefully be a little
efficient!  The actual problem has been that I preferred to try to get
to the bottom of those deadlocks on my own, and not to bother you also
on that issue.  Maybe next time I will ask you one more question
instead of one less :)

>> At any rate, bfq seems now to work, so I can finally move from just
>> asking questions endlessly, to proposing actual code to discuss on.
>> I'm about to: port this version of bfq to your improved/fixed
>> blk-mq-sched version in for-4.11 (port postponed, to avoid introducing
>> further changes in code that did not yet wok), run more extensive
>> tests, polish commits a little bit, and finally share a branch.
> 
> Post the code sooner rather than later. There are bound to be things
> that need to be improved or fixed up, let's start this process now. The
> framework is pretty much buttoned up at this point, so there's time to
> shift the attention a bit to a consumer of it.
> 

Ok, to follow this suggestion of yours at 100%, I have postponed
several steps (removal of any invariant check or extra log message,
merging of the various files bfq is made of in just one file, code
polishing), and I'm about to share my current WIP branch in a
follow-up message.

Thanks,
Paolo

> -- 
> Jens Axboe