linux-kernel - Re: [RESEND PATCH] blk-mq: fix hang caused by freeze/unfreeze sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD9gYJKYu15b7ZuAvud-BzG5fRP=B33yPJuEq84nsN_8y3EC1Q@mail.gmail.com>
Date:   Tue, 9 Apr 2019 11:29:42 +0200
From:   Jinpu Wang <jinpuwang@...il.com>
To:     Bob Liu <bob.liu@...cle.com>, r.peniaev@...il.com
Cc:     linux-block@...r.kernel.org, shirley.ma@...cle.com,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Akinobu Mita <akinobu.mita@...il.com>,
        Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@....de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RESEND PATCH] blk-mq: fix hang caused by freeze/unfreeze sequence

Bob Liu <bob.liu@...cle.com> 于2019年4月9日周二 上午11:11写道：
>
> This patch was proposed by Roman Pen[3] years ago.
> Recently we hit a bug which is likely caused by the same reason,so rebased his
> fix to v5.1 and resend.
> Below is almost copied from that patch[3].
>
> ------
> Long time ago there was a similar fix proposed by Akinobu Mita[1],
> but it seems that time everyone decided to fix this subtle race in
> percpu-refcount and Tejun Heo[2] did an attempt (as I can see that
> patchset was not applied).
>
> The following is a description of a hang in blk_mq_freeze_queue_wait() -
> same fix but a bug from another angle.
>
> The hang happens on attempt to freeze a queue while another task does
> queue unfreeze.
>
> The root cause is an incorrect sequence of percpu_ref_reinit() and
> percpu_ref_kill() and as a result those two can be swapped:
>
>  CPU#0               CPU#1
>  ----------------    -----------------
>  percpu_ref_kill()
>
>                      percpu_ref_kill() << atomic reference does
>  percpu_ref_reinit()                   << not guarantee the order
>
>                      blk_mq_freeze_queue_wait() << HANG HERE
>
>                      percpu_ref_reinit()
>
> Firstly this wrong sequence raises two kernel warnings:
>
>   1st. WARNING at lib/percpu-recount.c:309
>        percpu_ref_kill_and_confirm called more than once
>
>   2nd. WARNING at lib/percpu-refcount.c:331
>
> But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
> which waits for a zero of a q_usage_counter, which never happens
> because percpu-ref was reinited (instead of being killed) and stays in
> PERCPU state forever.
>
> The simplified sequence above can be reproduced on shared tags, when
> queue A is going to die meanwhile another queue B is in init state and
> is trying to freeze the queue A, which shares the same tags set:
>
>  CPU#0                           CPU#1
>  ------------------------------- ------------------------------------
>  q1 = blk_mq_init_queue(shared_tags)
>
>                                 q2 = blk_mq_init_queue(shared_tags):
>                                   blk_mq_add_queue_tag_set(shared_tags):
>                                     blk_mq_update_tag_set_depth(shared_tags):
>                                       blk_mq_freeze_queue(q1)
>  blk_cleanup_queue(q1)                 ...
>    blk_mq_freeze_queue(q1)   <<<->>>   blk_mq_unfreeze_queue(q1)
>
> [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@...il.com
> [2] Message id: 1443563240-29306-6-git-send-email-tj@...nel.org
> [3] https://patchwork.kernel.org/patch/9268199/
>
> Signed-off-by: Roman Pen <roman.penyaev@...fitbricks.com>
> Signed-off-by: Bob Liu <bob.liu@...cle.com>
> Cc: Akinobu Mita <akinobu.mita@...il.com>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Jens Axboe <axboe@...nel.dk>
> Cc: Christoph Hellwig <hch@....de>
> Cc: linux-block@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
>

Replaced Roman's email address.

We at 1 & 1 IONOS (former ProfitBricks) have been carried this patch
for some years,
it has been running in production for some years too, would be good to
see it in upstream :)

Thanks,

Jack Wang
Linux Kernel Developer @ 1 & 1 IONOS