[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47dd353a-754c-4ded-a9bb-11f8400ac3fe@linux.alibaba.com>
Date: Mon, 29 Jul 2024 06:35:22 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Matthew Wilcox <willy@...radead.org>, Huang Ying <ying.huang@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/migrate: fix deadlock in migrate_pages_batch() on
large folios
Hi,
On 2024/7/29 05:17, Matthew Wilcox wrote:
> On Sun, Jul 28, 2024 at 12:50:05PM -0700, Andrew Morton wrote:
>> On Sun, 28 Jul 2024 23:49:13 +0800 Gao Xiang <hsiangkao@...ux.alibaba.com> wrote:
>>> Currently, migrate_pages_batch() can lock multiple locked folios
>>> with an arbitrary order. Although folio_trylock() is used to avoid
>>> deadlock as commit 2ef7dbb26990 ("migrate_pages: try migrate in batch
>>> asynchronously firstly") mentioned, it seems try_split_folio() is
>>> still missing.
>>
>> Am I correct in believing that folio_lock() doesn't have lockdep coverage?
>
> Yes. It can't; it is taken in process context and released by whatever
> context the read completion happens in (could be hard/soft irq, could be
> a workqueue, could be J. Random kthread, depending on the device driver)
> So it doesn't match the lockdep model at all.
>
>>> It was found by compaction stress test when I explicitly enable EROFS
>>> compressed files to use large folios, which case I cannot reproduce with
>>> the same workload if large folio support is off (current mainline).
>>> Typically, filesystem reads (with locked file-backed folios) could use
>>> another bdev/meta inode to load some other I/Os (e.g. inode extent
>>> metadata or caching compressed data), so the locking order will be:
>>
>> Which kernels need fixing. Do we expect that any code paths in 6.10 or
>> earlier are vulnerable to this?
>
> I would suggest it goes back to the introduction of large folios, but
> that's just a gut feeling based on absolutely no reading of code or
> inspection of git history.
According to 5dfab109d519 ("migrate_pages: batch _unmap and _move"),
I think it's v6.3+.
Yet I don't have more time to look info all history of batching
migration, hoping Huang, Ying could give more hints on this.
Thanks,
Gao Xiang
Powered by blists - more mailing lists