[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGZn=28_q5tGd5kL_nU3Tz3_XA+gqODGpP6CKsZB6tKb3dAXtA@mail.gmail.com>
Date: Thu, 30 Dec 2021 17:06:23 -0800
From: Shachar Raindel <shacharr@...il.com>
To: Chao Yu <chao@...nel.org>
Cc: jaegeuk@...nel.org, linux-kernel@...r.kernel.org,
Yi Zhuang <zhuangyi1@...wei.com>,
linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH] f2fs: quota: fix potential deadlock
Somewhat late to the party (i.e. 3 months late), happy mailbox cleanup holidays!
On Thu, Sep 2, 2021 at 8:04 PM Chao Yu <chao@...nel.org> wrote:
>
> As Yi Zhuang reported in bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=214299
>
Bug report is for kernel 5.3. When I reported very similar deadlock in
google-msm 4.9 tree (
https://lore.kernel.org/linux-f2fs-devel/20201128174124.22397-1-shacharr@gmail.com/t/
), you pointed out that the code was missing commits which removed the
cp_rwsem grabbing (which are also missing from kernel 5.3):
commit 435cbab95e3966cd8310addd9e9b758dce0e8b84
Author: Jaegeuk Kim <jaegeuk@...nel.org>
Date: Thu Apr 9 10:25:21 2020 -0700
f2fs: fix quota_sync failure due to f2fs_lock_op
commit ca7f76e680745d3b8a386638045f85dac1c4b2f4
Author: Chao Yu <chao@...nel.org>
Date: Fri May 29 17:29:47 2020 +0800
f2fs: fix wrong discard space
commit 79963d967b492876fa17c8c2c2c17b7438683d9b
Author: Chao Yu <chao@...nel.org>
Date: Thu Jun 18 14:36:23 2020 +0800
f2fs: shrink node_write lock coverage
Is this patch needed with these commits applied?
>
> There is potential deadlock during quota data flush as below:
>
> Thread A: Thread B:
> f2fs_dquot_acquire
> down_read(&sbi->quota_sem)
> f2fs_write_checkpoint
> block_operations
> f2fs_look_all
> down_write(&sbi->cp_rwsem)
> f2fs_quota_write
> f2fs_write_begin
> __do_map_lock
> f2fs_lock_op
> down_read(&sbi->cp_rwsem)
> __need_flush_qutoa
> down_write(&sbi->quota_sem)
>
> This patch changes block_operations() to use trylock, if it fails,
> it means there is potential quota data updater, in this condition,
> let's flush quota data first and then trylock again to check dirty
> status of quota data.
>
> The side effect is: in heavy race condition (e.g. multi quota data
> upaters vs quota data flusher), it may decrease the probability of
> synchronizing quota data successfully in checkpoint() due to limited
> retry time of quota flush.
>
> Reported-by: Yi Zhuang <zhuangyi1@...wei.com>
> Signed-off-by: Chao Yu <chao@...nel.org>
As this patch is applied in the mainline kernel, can we CC -stable to
get this patch into the various Android kernels? Specifically,
https://android.googlesource.com/kernel/msm/+/refs/tags/android-12.0.0_r0.21/fs/f2fs/checkpoint.c#1147
needs this patch (alongside many other google-msm kernel branches).
Thanks,
--Shachar
Powered by blists - more mailing lists