[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ba280ed-b4f8-dfe0-16ea-1a10b0de7eb4@oracle.com>
Date: Sat, 13 Apr 2019 08:36:54 +0800
From: Bob Liu <bob.liu@...cle.com>
To: Jinpu Wang <jinpuwang@...il.com>, r.peniaev@...il.com
Cc: linux-block@...r.kernel.org, shirley.ma@...cle.com,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Akinobu Mita <akinobu.mita@...il.com>,
Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...nel.dk>,
Christoph Hellwig <hch@....de>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RESEND PATCH] blk-mq: fix hang caused by freeze/unfreeze
sequence
On 4/9/19 5:29 PM, Jinpu Wang wrote:
> Bob Liu <bob.liu@...cle.com> 于2019年4月9日周二 上午11:11写道:
>>
>> This patch was proposed by Roman Pen[3] years ago.
>> Recently we hit a bug which is likely caused by the same reason,so rebased his
>> fix to v5.1 and resend.
>> Below is almost copied from that patch[3].
>>
>> ------
>> Long time ago there was a similar fix proposed by Akinobu Mita[1],
>> but it seems that time everyone decided to fix this subtle race in
>> percpu-refcount and Tejun Heo[2] did an attempt (as I can see that
>> patchset was not applied).
>>
>> The following is a description of a hang in blk_mq_freeze_queue_wait() -
>> same fix but a bug from another angle.
>>
>> The hang happens on attempt to freeze a queue while another task does
>> queue unfreeze.
>>
>> The root cause is an incorrect sequence of percpu_ref_reinit() and
>> percpu_ref_kill() and as a result those two can be swapped:
>>
>> CPU#0 CPU#1
>> ---------------- -----------------
>> percpu_ref_kill()
>>
>> percpu_ref_kill() << atomic reference does
>> percpu_ref_reinit() << not guarantee the order
>>
>> blk_mq_freeze_queue_wait() << HANG HERE
>>
>> percpu_ref_reinit()
>>
>> Firstly this wrong sequence raises two kernel warnings:
>>
>> 1st. WARNING at lib/percpu-recount.c:309
>> percpu_ref_kill_and_confirm called more than once
>>
>> 2nd. WARNING at lib/percpu-refcount.c:331
>>
>> But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
>> which waits for a zero of a q_usage_counter, which never happens
>> because percpu-ref was reinited (instead of being killed) and stays in
>> PERCPU state forever.
>>
>> The simplified sequence above can be reproduced on shared tags, when
>> queue A is going to die meanwhile another queue B is in init state and
>> is trying to freeze the queue A, which shares the same tags set:
>>
>> CPU#0 CPU#1
>> ------------------------------- ------------------------------------
>> q1 = blk_mq_init_queue(shared_tags)
>>
>> q2 = blk_mq_init_queue(shared_tags):
>> blk_mq_add_queue_tag_set(shared_tags):
>> blk_mq_update_tag_set_depth(shared_tags):
>> blk_mq_freeze_queue(q1)
>> blk_cleanup_queue(q1) ...
>> blk_mq_freeze_queue(q1) <<<->>> blk_mq_unfreeze_queue(q1)
>>
>> [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@...il.com
>> [2] Message id: 1443563240-29306-6-git-send-email-tj@...nel.org
>> [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_9268199_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=1ktT0U2YS_I8Zz2o-MS1YcCAzWZ6hFGtyTgvVMGM7gI&m=OcA07QqFechuCug2pqm_-JpGP_mOt0YouTXApdePMGw&s=VM_-8S5gkFo8zUjT5RoY0CkbxN6hQmTwVmslulwsFJM&e=
>>
>> Signed-off-by: Roman Pen <roman.penyaev@...fitbricks.com>
>> Signed-off-by: Bob Liu <bob.liu@...cle.com>
>> Cc: Akinobu Mita <akinobu.mita@...il.com>
>> Cc: Tejun Heo <tj@...nel.org>
>> Cc: Jens Axboe <axboe@...nel.dk>
>> Cc: Christoph Hellwig <hch@....de>
>> Cc: linux-block@...r.kernel.org
>> Cc: linux-kernel@...r.kernel.org
>>
>
> Replaced Roman's email address.
>
> We at 1 & 1 IONOS (former ProfitBricks) have been carried this patch
> for some years,
> it has been running in production for some years too,
Nice to hear that!
> would be good to see it in upstream :)
Yes.
Could anyone have a review? Thanks!
>
> Thanks,
>
> Jack Wang
> Linux Kernel Developer @ 1 & 1 IONOS
>
Powered by blists - more mailing lists