lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFqO0i4mWL508P0S@google.com>
Date:   Tue, 9 May 2023 11:20:02 -0700
From:   Minchan Kim <minchan@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Nhat Pham <nphamcs@...il.com>, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        ngupta@...are.org, sjenning@...hat.com, ddstreet@...e.org,
        vitaly.wool@...sulko.com, kernel-team@...a.com
Subject: Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to
 zs_malloc()

Hi Folks,

On Tue, May 09, 2023 at 01:44:01PM -0400, Johannes Weiner wrote:
> On Tue, May 09, 2023 at 12:00:30PM +0900, Sergey Senozhatsky wrote:
> > On (23/05/08 09:00), Nhat Pham wrote:
> > > > The deeper bug here is that zs_map_object() tries to add the page to
> > > > the LRU list while the shrinker has it isolated for reclaim. This is
> > > > way too sutble and error prone. Even if it worked now, it'll cause
> > > > corruption issues down the line.
> > > >
> > > > For example, Nhat is adding a secondary entry point to reclaim.
> > > > Reclaim expects that a page that's on the LRU is also on the fullness
> > > > list, so this would lead to a double remove_zspage() and BUG_ON().
> > > >
> > > > This patch doesn't just fix the crash, it eliminates the deeper LRU
> > > > isolation issue and makes the code more robust and simple.
> > > 
> > > I agree. IMO, less unnecessary concurrent interaction is always a
> > > win for developers' and maintainers' cognitive load.
> > 
> > Thanks for all the explanations.
> > 
> > > As a side benefit - this also gets rid of the inelegant check
> > > (mm == ZS_MM_WO). The fact that we had to include a
> > > a multi-paragraph explanation for a 3-line piece of code
> > > should have been a red flag.
> > 
> > Minchan had some strong opinion on that, so we need to hear from him
> > before we decide how do we fix it.
> 
> I'd be happy if he could validate the fix. But this fixes a crash, so
> the clock is ticking.
> 
> I will also say, his was a design preference. One we agreed to only
> very reluctantly: https://lore.kernel.org/lkml/Y3f6habiVuV9LMcu@google.com/
> 
> Now we have a crash that is a direct result of it, and which cost us
> (and apparently is still costing us) time and energy to resolve.
> 
> Unless somebody surfaces a real technical problem with the fix, I'd
> say let's do it our way this time.
> 

Sorry for being too late to review. The reason I insisted on it was
I overlookeded the bug and thought it was trivial change but better
semantic since zsmalloc provides separate API between allocation and
access unlike other allocators. Now, Nhat and Johannes provided it's
more error prone, I am totally fine with this fix and will live it
until the LRU writeback will move out of allocator.

Sorry for wasting your time to hunt this bug down and thank you for fix!

Acked-by: Minchan Kim <minchan@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ