linux-kernel - Re: [regression] oops on heavy compilations ("kernel BUG at mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbO938ETJn0FOG_vVU4V2_dBanio1QG56cp6ctFHpSeNw@mail.gmail.com>
Date: Fri, 23 Aug 2024 09:07:44 -0700
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Nhat Pham <nphamcs@...il.com>, 
	Linux regressions mailing list <regressions@...ts.linux.dev>, Piotr Oniszczuk <piotr.oniszczuk@...il.com>, 
	LKML <linux-kernel@...r.kernel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Linux-MM <linux-mm@...ck.org>
Subject: Re: [regression] oops on heavy compilations ("kernel BUG at
 mm/zswap.c:1005!" and "Oops: invalid opcode: 0000")

On Fri, Aug 23, 2024 at 7:47 AM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Fri, Aug 23, 2024 at 10:35:19AM -0400, Nhat Pham wrote:
> > On Fri, Aug 23, 2024 at 9:13 AM Matthew Wilcox <willy@...radead.org> wrote:
> > >
> > >
> > > That said, zswap could handle this better.  There's no need to panic the
> > > entire machine over being unable to read a page from swap.  Killing just
> > > the process that needed this page is sufficient.
> >
> > Agree 100%. It is silly to kill the entire host for a swap read error,
> > and extra silly to kill the process because we fail to writeback - for
> > all we know that page might never be needed by the process again!!!
> >
> > >
> > > Suggested patch at end after the oops.
> > >
> > > @@ -1601,6 +1613,7 @@ bool zswap_load(struct folio *folio)
> > >         bool swapcache = folio_test_swapcache(folio);
> > >         struct xarray *tree = swap_zswap_tree(swp);
> > >         struct zswap_entry *entry;
> > > +       int err;
> > >
> > >         VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > >
> > > @@ -1638,10 +1651,13 @@ bool zswap_load(struct folio *folio)
> > >         if (!entry)
> > >                 return false;
> > >
> > > -       if (entry->length)
> > > -               zswap_decompress(entry, folio);
> > > -       else
> > > +       if (entry->length) {
> > > +               err = zswap_decompress(entry, folio);
> > > +               if (err)
> > > +                       return false;
> >
> > Here, if zswap decompression fails and zswap load returns false, the
> > page_io logic will proceed as if zswap does not have the page and
> > reads garbage from the backing device instead. This could potentially
> > lead to silent data/memory corruption right? Or am I missing something
> > :) Maybe we could be extra careful here and treat it as if there is a
> > bio read error in the case zswap owns the page, but cannot decompress
> > it?
>
> Ah; you know more about how zswap works than I do.  So it's not a
> write-through cache?  I guess we need to go a bit further then and
> return an errno from zswap_load -- EIO/ENOENT/0 and handle that
> appropriately.

It should work if we just return true without calling
folio_mark_uptodate(), this is what we do if we get a large folio in
zswap_load(). Returning true means that the page was found in zswap,
so we won't fallback to reading from the backing device. Not marking
the folio uptodate will cause an IO error IIUC.

>
> > The rest seems solid to me :)