[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWkznHDpw5Sw5pAfB=TdgRqsf=bmwUQ6+kvvLht3=wumNNo6Q@mail.gmail.com>
Date: Mon, 15 Apr 2024 09:50:19 +0800
From: Zhaoyang Huang <huangzhaoyang@...il.com>
To: Dave Chinner <david@...morbit.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, "zhaoyang.huang" <zhaoyang.huang@...soc.com>,
Alex Shi <alexs@...nel.org>, "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, steve.kang@...soc.com
Subject: Re: [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock
during migration
On Mon, Apr 15, 2024 at 8:09 AM Dave Chinner <david@...morbit.com> wrote:
>
> On Sat, Apr 13, 2024 at 10:01:27AM +0800, Zhaoyang Huang wrote:
> > loop Dave, since he has ever helped set up an reproducer in
> > https://lore.kernel.org/linux-mm/20221101071721.GV2703033@dread.disaster.area/
> > @Dave Chinner , I would like to ask for your kindly help on if you can
> > verify this patch on your environment if convenient. Thanks a lot.
>
> I don't have the test environment from 18 months ago available any
> more. Also, I haven't seen this problem since that specific test
> environment tripped over the issue. Hence I don't have any way of
> confirming that the problem is fixed, either, because first I'd have
> to reproduce it...
Thanks for the information. I noticed that you reported another soft
lockup which is related to xas_load since NOV.2023. This patch is
supposed to be helpful for this. With regard to the version timing,
this commit is actually a revert of <mm/thp: narrow lru locking>
b6769834aac1d467fa1c71277d15688efcbb4d76 which is merged before v5.15.
For saving your time, a brief description below. IMO, b6769834aa
introduce a potential stall between freeze the folio's refcnt and
store it back to 2, which have the xas_load->folio_try_get_rcu loops
as livelock if it stalls the lru_lock's holder.
b6769834aa
split_huge_page_to_list
- spin_lock(lru_lock)
xas_split(&xas, folio,order)
folio_refcnt_freeze(folio, 1 + folio_nr_pages(folio0)
+ spin_lock(lru_lock)
xas_store(&xas, offset++, head+i)
page_ref_add(head, 2)
spin_unlock(lru_lock)
Sorry in advance if the above doesn't make sense, I am just a
developer who is also suffering from this bug and trying to fix it
>
> -Dave.
> --
> Dave Chinner
> david@...morbit.com
Powered by blists - more mailing lists