lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 17 Jan 2023 20:20:03 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Vlastimil Babka <vbabka@...e.cz>
CC:     "Sang, Oliver" <oliver.sang@...el.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Torvalds, Linus" <torvalds@...ux-foundation.org>,
        Jann Horn <jannh@...gle.com>,
        "Song, Youquan" <youquan.song@...el.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Jan Kara <jack@...e.cz>, John Hubbard <jhubbard@...dia.com>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>,
        "Yin, Fengwei" <fengwei.yin@...el.com>, <hongjiu.lu@...el.com>
Subject: Re: [linus:master] [hugetlb] 7118fc2906:
 kernel_BUG_at_lib/list_debug.c

On Tue, Jan 17, 2023 at 04:01:08PM +0800, Tang, Feng wrote:
> On Tue, Jan 17, 2023 at 03:39:15PM +0800, Vlastimil Babka wrote:
> > On 1/17/23 08:10, kernel test robot wrote:
> > > 
> > > +Vlastimil Babka, Hyeonggon Yoo, Feng Tang and Fengwei Yin
> > > 
> > > Hi, Mike Kravetz,
> > > 
> > > we reported
> > > "[linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h" [1]
> > > 
> > > Vlastimil, Hyeonggon, Feng and Fengwei gave us a lot of great guidances based on
> > > it, and, perticularly, after enabling below config per Vlastimil's suggestion
> > >   CONFIG_DEBUG_PAGEALLOC
> > >   CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
> > >   CONFIG_SLUB_DEBUG
> > >   CONFIG_SLUB_DEBUG_ON
> > > by more tests, we realized the "0af8489b02" is not the real culprit.
> > > 
> > > the new bisection was triggered and finally it pointed to this "7118fc2906".
> > > 
> > > though reporting for different issues
> > > ("kernel_BUG_at_include/linux/mm.h" for 0af8489b02 vs.
> > > "kernel_BUG_at_lib/list_debug.c" for this commit),
> > > Feng and Fengwei helped further to confirm they are similar.
> > > They will supply more technical wise analysis later.
> > > 
> > > please be noted the issues are not always happening
> > > (~10% on this commit or 0af8489b02)
> > 
> > Great find! Looking at the commit, I'd bet the only part relevant to our bug
> > is the "by the way we remove setting refcount to zero on tail pages which
> > should already be zero":
> > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index db00ee8d79d2..eeff64843718 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -754,7 +754,6 @@ void prep_compound_page(struct page *page, unsigned int order)
> > >         __SetPageHead(page);
> > >         for (i = 1; i < nr_pages; i++) {
> > >                 struct page *p = page + i;
> > > -               set_page_count(p, 0);
> > >                 p->mapping = TAIL_MAPPING;
> > >                 set_compound_head(p, page);
> > >         }
> > 
> > So either the assumption of refcount being already 0 is wrong (shouldn't be,
> > AFAIK?), or this atomic operation effectively prevents some very subtle race
> > (although IIRC atomic_set() has no barrier semantics defined, it could still
> > affect a specific CPU?
>  
> Yes, "set_page_count(p, 0);" seems to be what matters here. Restoring
> it make the list corruption issue not reproducable for 300+ runs.
> 
> And back in debugging 0af8489b02, the thing was similar that if we
> added some code inside prep_compound_page(), the issue also can't
> be reproduced.
> 
> So this 7118fc2906 seems just 'expose' the problem on i386, and is
> not the root cause.
> 
> I suspect it is related with i386 compiling, based on the debug and
> memory dump. I'm doing some compiler option and adding memory
> barrier in prep_compound_page(), and will update when the test run
> is done.

With the following patch to use 'O1' instead 'O2' gcc optoin for
page_alloc.c, the list corruption issue can't be reproduced for
commit 7118fc2906 in 1000 runs. 

Oliver has reproduced it for v6.0, applying the same patch can also
make the issue gone.

As is can't be reproduced with X86_64 build, it could be i386
compiling related.

I also objdumped 'prep_compound_page' for vmlinux of 7118fc2906 and
its parent commit 48b8d744ea84, which have big difference than the
simple 'set_page_count()' change, but I can't tell which part is
abnormal, so attach them for further check.

---
diff --git a/mm/Makefile b/mm/Makefile
index 8e105e5b3e293..2b3780208e65d 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -36,6 +36,8 @@ KCOV_INSTRUMENT_failslab.o := n
 CFLAGS_init-mm.o += $(call cc-disable-warning, override-init)
 CFLAGS_init-mm.o += $(call cc-disable-warning, initializer-overrides)
 
+CFLAGS_page_alloc.o += -O1
+
 mmu-y			:= nommu.o
 mmu-$(CONFIG_MMU)	:= highmem.o memory.o mincore.o \
 			   mlock.o mmap.o mmu_gather.o mprotect.o mremap.o \

Thanks,
Feng

> 
> Thanks,
> Feng
> 
> > I guess we could
> > - try to restore that set_page_count(p, 0); on current kernel to see if it
> > kills the bug
> > - instead of restoring it, add (only locally for purposes of the test) a
> > BUG_ON() if refcount is not zero already, and find out why if it triggers
> > (unfortunately might also appear to fix the bug even if it doesn't trigger).

View attachment "7118fc2906_objdump.log" of type "text/plain" (10630 bytes)

View attachment "48b8d744ea84_objdump.log" of type "text/plain" (10450 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ