lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 23 Apr 2023 19:17:12 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     David Stevens <stevensd@...omium.org>
cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ivan Orlov <ivan.orlov0322@...il.com>,
        Jiaqi Yan <jiaqiyan@...gle.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH next] mm/khugepaged: fix conflicting mods to
 collapse_file()

On Mon, 24 Apr 2023, David Stevens wrote:
> On Sun, Apr 23, 2023 at 1:47 PM Hugh Dickins <hughd@...gle.com> wrote:
> >
> > Inserting Ivan Orlov's syzbot fix commit 2ce0bdfebc74
> > ("mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()")
> > ahead of Jiaqi Yan's and David Stevens's commits
> > 12904d953364 ("mm/khugepaged: recover from poisoned file-backed memory")
> > cae106dd67b9 ("mm/khugepaged: refactor collapse_file control flow")
> > ac492b9c70ca ("mm/khugepaged: skip shmem with userfaultfd")
> > (all of which restructure collapse_file()) did not work out well.
> >
> > xfstests generic/086 on huge tmpfs (with accelerated khugepaged) freezes
> > (if not on the first attempt, then the 2nd or 3rd) in find_lock_entries()
> > while doing drop_caches: the file's xarray seems to have been corrupted,
> > with find_get_entry() returning nonsense which makes no progress.
> >
> > Bisection led to ac492b9c70ca; and diff against earlier working linux-next
> > suggested that it's probably down to an errant xas_store(), which does not
> > belong with the later changes (and nor does the positioning of warnings).
> > The later changes look as if they fix the syzbot issue independently.
> >
> > Remove most of what's left of 2ce0bdfebc74: just leave one WARN_ON_ONCE
> > (xas_error) after the final xas_store() of the multi-index entry.
> >
> > Signed-off-by: Hugh Dickins <hughd@...gle.com>
> > ---
> >
> >  mm/khugepaged.c | 23 +----------------------
> >  1 file changed, 1 insertion(+), 22 deletions(-)
> >
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1941,16 +1941,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
> >                                         result = SCAN_FAIL;
> >                                         goto xa_locked;
> >                                 }
> > -                               xas_store(&xas, hpage);
> > -                               if (xas_error(&xas)) {
> > -                                       /* revert shmem_charge performed
> > -                                        * in the previous condition
> > -                                        */
> > -                                       mapping->nrpages--;
> > -                                       shmem_uncharge(mapping->host, 1);
> > -                                       result = SCAN_STORE_FAILED;
> 
> With this being removed, SCAN_STORE_FAILED should also be removed from
> the scan_result enum and trace event definitions.

Only if we also remove your use of SCAN_STORE_FAILED in ac492b9c70ca:
what would you want that to say instead?

I don't care myself for any of those "SCAN" result codes, nor whether they
are few or many: I'd rather have __LINE__ numbers for my own debugging.

But if people want to remove SCAN_STORE_FAILED now, sure, send a patch;
my intent was to unbreak the breakage.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ