linux-kernel - Re: [PATCH] cpuset,mm: fix allocating page cache/slab object on the unallowed node when memory spread is set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200812311413.45127.nickpiggin@yahoo.com.au>
Date:	Wed, 31 Dec 2008 14:13:44 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	miaox@...fujitsu.com, menage@...gle.com, cl@...ux-foundation.org,
	penberg@...helsinki.fi, mpm@...enic.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] cpuset,mm: fix allocating page cache/slab object on the unallowed node when memory spread is set

On Wednesday 31 December 2008 09:28:05 Andrew Morton wrote:
> On Fri, 26 Dec 2008 14:37:07 +0800
>
> Miao Xie <miaox@...fujitsu.com> wrote:
> > The task still allocated the page caches on old node after modifying its
> > cpuset's mems when 'memory_spread_page' was set, it is caused by the old
> > mem_allowed_list of the task. Slab has the same problem.
>
> ok...
>
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index f3e5f89..d978983 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -517,6 +517,9 @@ int add_to_page_cache_lru(struct page *page, struct
> > address_space *mapping, #ifdef CONFIG_NUMA
> >  struct page *__page_cache_alloc(gfp_t gfp)
> >  {
> > +	if ((gfp & __GFP_WAIT) && !in_interrupt())
> > +		cpuset_update_task_memory_state();
> > +
> >  	if (cpuset_do_page_mem_spread()) {
> >  		int n = cpuset_mem_spread_node();
> >  		return alloc_pages_node(n, gfp, 0);
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 0918751..3b6e3d7 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3460,6 +3460,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t
> > flags, void *caller) if (should_failslab(cachep, flags))
> >  		return NULL;
> >
> > +	if ((flags & __GFP_WAIT) && !in_interrupt())
> > +		cpuset_update_task_memory_state();
> > +

These paths are pretty performance critical. Why don't cpusets code do this
work in the slowpath where the cpuset's mems_allowed gets changed rather
than putting these calls all over the place with apparently no real rhyme or
reason :( (this is not against your patch, but just this part of the cpusets
design)

> >  	cache_alloc_debugcheck_before(cachep, flags);
> >  	local_irq_save(save_flags);
> >  	objp = __do_cache_alloc(cachep, flags);
>
> Problems.
>
> a) There's no need to test in_interrupt().  Any caller who passed us
>    __GFP_WAIT from interrupt context is horridly buggy and needs to be
>    fixed.

Right. There are existing sites that do the same check, which is probably
where it is copied from.


> b) Even if the caller _did_ set __GFP_WAIT, there's no guarantee
>    that we're deadlock safe here.  Does anyone ever do a __GFP_WAIT
>    allocation while holding callback_mutex?  If so, it'll deadlock.

It's static to cpuset.c, so I'd hope not.


> c) These are two of the kernel's hottest code paths.  We really
>    really really really don't want to be polling for some dopey
>    userspace admin change on each call to __cache_alloc()!

Yeah, right. Let's try to fix cpuset.c instead...


> d) How does slub handle this problem?

SLUB seems to do a "sloppy" kind of memory policy allocation, where it just
relies on the page allocator to hand us the correct page and AFAIKS does not
exactly obey this stuff all the time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/