[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230509174401.GA18828@cmpxchg.org>
Date: Tue, 9 May 2023 13:44:01 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Nhat Pham <nphamcs@...il.com>, Minchan Kim <minchan@...nel.org>,
akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, ngupta@...are.org,
sjenning@...hat.com, ddstreet@...e.org, vitaly.wool@...sulko.com,
kernel-team@...a.com
Subject: Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to
zs_malloc()
On Tue, May 09, 2023 at 12:00:30PM +0900, Sergey Senozhatsky wrote:
> On (23/05/08 09:00), Nhat Pham wrote:
> > > The deeper bug here is that zs_map_object() tries to add the page to
> > > the LRU list while the shrinker has it isolated for reclaim. This is
> > > way too sutble and error prone. Even if it worked now, it'll cause
> > > corruption issues down the line.
> > >
> > > For example, Nhat is adding a secondary entry point to reclaim.
> > > Reclaim expects that a page that's on the LRU is also on the fullness
> > > list, so this would lead to a double remove_zspage() and BUG_ON().
> > >
> > > This patch doesn't just fix the crash, it eliminates the deeper LRU
> > > isolation issue and makes the code more robust and simple.
> >
> > I agree. IMO, less unnecessary concurrent interaction is always a
> > win for developers' and maintainers' cognitive load.
>
> Thanks for all the explanations.
>
> > As a side benefit - this also gets rid of the inelegant check
> > (mm == ZS_MM_WO). The fact that we had to include a
> > a multi-paragraph explanation for a 3-line piece of code
> > should have been a red flag.
>
> Minchan had some strong opinion on that, so we need to hear from him
> before we decide how do we fix it.
I'd be happy if he could validate the fix. But this fixes a crash, so
the clock is ticking.
I will also say, his was a design preference. One we agreed to only
very reluctantly: https://lore.kernel.org/lkml/Y3f6habiVuV9LMcu@google.com/
Now we have a crash that is a direct result of it, and which cost us
(and apparently is still costing us) time and energy to resolve.
Unless somebody surfaces a real technical problem with the fix, I'd
say let's do it our way this time.
Powered by blists - more mailing lists