linux-kernel - Re: [PATCH 0/5] make slab gfp fair

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 21 May 2007 21:33:58 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Christoph Lameter <clameter@....com>
Cc:	Matt Mackall <mpm@...enic.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, Thomas Graf <tgraf@...g.ch>,
	David Miller <davem@...emloft.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Daniel Phillips <phillips@...gle.com>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Paul Jackson <pj@....com>, npiggin@...e.de
Subject: Re: [PATCH 0/5] make slab gfp fair

On Mon, 2007-05-21 at 09:45 -0700, Christoph Lameter wrote:
> On Sun, 20 May 2007, Peter Zijlstra wrote:
> 
> > I care about kernel allocations only. In particular about those that
> > have PF_MEMALLOC semantics.
> 
> Hmmmm.. I wish I was more familiar with PF_MEMALLOC. ccing Nick.
> 
> >  - set page->reserve nonzero for each page allocated with
> >    ALLOC_NO_WATERMARKS; which by the previous point implies that all
> >    available zones are below ALLOC_MIN|ALLOC_HIGH|ALLOC_HARDER
> 
> Ok that adds a new field to the page struct. I suggested a page flag in 
> slub before.

No it doesn't; it overloads page->index. Its just used as extra return
value, it need not be persistent. Definitely not worth a page-flag.

> >  - when a page->reserve slab is allocated store it in s->reserve_slab
> >    and do not update the ->cpu_slab[] (this forces subsequent allocs to
> >    retry the allocation).
> 
> Right that should work.
>  
> > All ALLOC_NO_WATERMARKS enabled slab allocations are served from
> > ->reserve_slab, up until the point where a !page->reserve slab alloc
> > succeeds, at which point the ->reserve_slab is pushed into the partial
> > lists and ->reserve_slab set to NULL.
> 
> So the original issue is still not fixed. A slab alloc may succeed without
> watermarks if that particular allocation is restricted to a different set 
> of nodes. Then the reserve slab is dropped despite the memory scarcity on
> another set of nodes?

I can't see how. This extra ALLOC_MIN|ALLOC_HIGH|ALLOC_HARDER alloc will
first deplete all other zones. Once that starts failing no node should
still have pages accessible by any allocation context other than
PF_MEMALLOC.

> > Since only the allocation of a new slab uses the gfp zone flags, and
> > other allocations placement hints they have to be uniform over all slab
> > allocs for a given kmem_cache. Thus the s->reserve_slab/page->reserve
> > status is kmem_cache wide.
> 
> No the gfp zone flags are not uniform and placement of page allocator 
> allocs through SLUB do not always have the same allocation constraints.

It has to; since it can serve the allocation from a pre-existing slab
allocation. Hence any page allocation must be valid for all other users.

> SLUB will check the node of the page that was allocated when the page 
> allocator returns and put the page into that nodes slab list. This varies
> depending on the allocation context.

Yes, it keeps slabs on per node lists. I'm just not seeing how this puts
hard constraints on the allocations.

As far as I can see there cannot be a hard constraint here, because
allocations form interrupt context are at best node local. And node
affine zone lists still have all zones, just ordered on locality.

> Allocations can be particular to uses of a slab in particular situations. 
> A kmalloc cache can be used to allocate from various sets of nodes in 
> different circumstances. kmalloc will allow serving a limited number of 
> objects from the wrong nodes for performance reasons but the next 
> allocation from the page allocator (or from the partial lists) will occur 
> using the current set of allowed nodes in order to ensure a rough 
> obedience to the memory policies and cpusets. kmalloc_node behaves 
> differently and will enforce using memory from a particular node.

>>From what I can see, it takes pretty much any page it can get once you
hit it with PF_MEMALLOC. If the page allocation doesn't use ALLOC_CPUSET
the page can come from pretty much anywhere.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/