[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=PisWyb-Gt=n7ZBPNTNjbRNM5F5j4LE4NoRgTUB+2ZV-w@mail.gmail.com>
Date: Tue, 7 Oct 2025 16:52:43 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Kairui Song <ryncsn@...il.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Kemeng Shi <shikemeng@...weicloud.com>, Kairui Song <kasong@...cent.com>,
Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>, Chris Li <chrisl@...nel.org>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, David Hildenbrand <david@...hat.com>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>, Ying Huang <ying.huang@...ux.alibaba.com>,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/4] mm, swap: do not perform synchronous discard during allocation
On Mon, Oct 6, 2025 at 1:03 PM Kairui Song <ryncsn@...il.com> wrote:
>
> From: Kairui Song <kasong@...cent.com>
>
> Since commit 1b7e90020eb77 ("mm, swap: use percpu cluster as allocation
> fast path"), swap allocation is protected by a local lock, which means
> we can't do any sleeping calls during allocation.
>
> However, the discard routine is not taken well care of. When the swap
> allocator failed to find any usable cluster, it would look at the
> pending discard cluster and try to issue some blocking discards. It may
> not necessarily sleep, but the cond_resched at the bio layer indicates
> this is wrong when combined with a local lock. And the bio GFP flag used
> for discard bio is also wrong (not atomic).
>
> It's arguable whether this synchronous discard is helpful at all. In
> most cases, the async discard is good enough. And the swap allocator is
> doing very differently at organizing the clusters since the recent
> change, so it is very rare to see discard clusters piling up.
>
> So far, no issues have been observed or reported with typical SSD setups
> under months of high pressure. This issue was found during my code
> review. But by hacking the kernel a bit: adding a mdelay(100) in the
> async discard path, this issue will be observable with WARNING triggered
> by the wrong GFP and cond_resched in the bio layer.
>
> So let's fix this issue in a safe way: remove the synchronous discard in
> the swap allocation path. And when order 0 is failing with all cluster
> list drained on all swap devices, try to do a discard following the swap
> device priority list. If any discards released some cluster, try the
> allocation again. This way, we can still avoid OOM due to swap failure
> if the hardware is very slow and memory pressure is extremely high.
>
> Cc: <stable@...r.kernel.org>
> Fixes: 1b7e90020eb77 ("mm, swap: use percpu cluster as allocation fast path")
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
Seems reasonable to me.
Acked-by: Nhat Pham <nphamcs@...il.com>
Powered by blists - more mailing lists