lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3E34D175-88F9-4114-B627-A11262A7B470@linaro.org>
Date:   Tue, 30 Jan 2018 16:40:01 +0100
From:   Paolo Valente <paolo.valente@...aro.org>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     Oleksandr Natalenko <oleksandr@...alenko.name>,
        Ivan Kozik <ivan@...ios.org>,
        linux-block <linux-block@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        'Paolo Valente' via bfq-iosched 
        <bfq-iosched@...glegroups.com>, Jens Axboe <axboe@...nel.dk>,
        Linus Walleij <linus.walleij@...aro.org>,
        SERENA ZIVIANI <169364@...denti.unimore.it>
Subject: Re: v4.15 and I/O hang with BFQ



> Il giorno 30 gen 2018, alle ore 15:40, Ming Lei <ming.lei@...hat.com> ha scritto:
> 
> On Tue, Jan 30, 2018 at 03:30:28PM +0100, Oleksandr Natalenko wrote:
>> Hi.
>> 
> ...
>>   systemd-udevd-271   [000] ....     4.311033: bfq_insert_requests: insert
>> rq->0
>>   systemd-udevd-271   [000] ...1     4.311037: blk_mq_do_dispatch_sched:
>> not get rq, 1
>>          cfdisk-408   [000] ....    13.484220: bfq_insert_requests: insert
>> rq->1
>>    kworker/0:1H-174   [000] ....    13.484253: blk_mq_do_dispatch_sched:
>> not get rq, 1
>> ===
>> 
>> Looks the same, right?
> 
> Yeah, same with before.
> 

Hi guys,
sorry for the delay with this fix.  We are proceeding very slowly on
this, because I'm super busy.  Anyway, now I can at least explain in
more detail the cause that leads to this hang.  Commit 'a6a252e64914
("blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ")'
makes all non-flush re-prepared requests be re-inserted into the I/O
scheduler.  With this change, I/O schedulers may get the same request
inserted again, even several times, without a finish_request invoked
on the request before each re-insertion.

For the I/O scheduler, every such re-prepared request is equivalent
to the insertion of a new request. For schedulers like mq-deadline
or kyber this fact causes no problems. In contrast, it confuses a stateful
scheduler like BFQ, which preserves states for an I/O request until
finish_request is invoked on it. In particular, BFQ has no way
to know that the above re-insertions concerns the same, already dispatched
request. So it may get stuck waiting for the completion of these
re-inserted requests forever, thus preventing any other queue of
requests to be served.

We are trying to address this issue by adding the hook requeue_request
to bfq interface.

Unfortunately, with our current implementation of requeue_request in
place, bfq eventually gets to an incoherent state.  This is apparently
caused by a requeue of an I/O request, immediately followed by a
completion of the same request.  This seems rather absurd, and drives
bfq crazy.  But this is something for which we don't have definite
results yet.

We're working on it, sorry again for the delay.

Thanks,
Paolo

> -- 
> Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ