lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1108051041580.27518@router.home>
Date:	Fri, 5 Aug 2011 10:51:42 -0500 (CDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Mel Gorman <mgorman@...e.de>
cc:	Xiaotian Feng <xtfeng@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Pekka Enberg <penberg@...nel.org>
Subject: Re: kernel BUG at mm/vmscan.c:1114

On Fri, 5 Aug 2011, Mel Gorman wrote:

> > > This is interesting, I just change as following:
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index eb5a8f9..616b78e 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2104,8 +2104,9 @@ static void *__slab_alloc(struct kmem_cache *s,
> > > gfp_t gfpflags, int node,
> > >                        "__slab_alloc"));
> > >
> > >        if (unlikely(!object)) {
> > > -               c->page = NULL;
> > > +               //c->page = NULL;
> > >                stat(s, DEACTIVATE_BYPASS);
> > > +               deactivate_slab(s, c);
> > >                goto new_slab;
> > >        }
> > >
> > > Then my system doesn't print any list corruption warnings and my build
> > > success then. So this means revert of 03e404af2 could cure this.
> > > I'll do more test next week to see if the list corruption still exist, thanks.
> > >
> >
> > Sorry, please ignore it... My system corrupted before I went to leave ....
> >
>
> Please continue the bisection in that case and establish for sure if the
> problem is in that series or not. Thanks.

The above fix should not affect anything since a per cpu slab
is not on any partial lists. And since there are no objects remaining in
the slab there is then also no point of putting it back. It wont be on
any lists before and after the action so no list processing is needed.

Hmmm.... There maybe a race with slab_free from a remote processor. I
dont see any problem here since we convert the page from frozen to
nonfrozen in __slab_alloc and __slab_free will ignore the partial list
management if it sees it to be frozen.

Maybe we need some memory barriers here. Right now we are relying on the
cmpxchg_double for sync of the state in the page struct but we also need
the c->page variable to be consistent with that state. But we disable
interrupts in __slab_alloc so there are no races possible with slab_free
only with remote __slab_free invocations which will not touch c->page.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ