lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1199990565.5331.130.camel@cinder.waste.org>
Date:	Thu, 10 Jan 2008 12:42:45 -0600
From:	Matt Mackall <mpm@...enic.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Pekka J Enberg <penberg@...helsinki.fi>,
	Christoph Lameter <clameter@....com>,
	Ingo Molnar <mingo@...e.hu>, Hugh Dickins <hugh@...itas.com>,
	Andi Kleen <andi@...stfloor.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] greatly reduce SLOB external fragmentation


On Thu, 2008-01-10 at 10:28 -0800, Linus Torvalds wrote:
> 
> On Thu, 10 Jan 2008, Matt Mackall wrote:
> > > 
> > > (I'm not a fan of slabs per se - I think all the constructor/destructor 
> > > crap is just that: total crap - but the size/type binning is a big deal, 
> > > and I think SLOB was naïve to think a pure first-fit makes any sense. Now 
> > > you guys are size-binning by just two or three bins, and it seems to make 
> > > a difference for some loads, but compared to SLUB/SLAB it's a total hack).
> > 
> > Here I'm going to differ with you. The premises of the SLAB concept
> > (from the original paper) are: 
> 
> I really don't think we differ.
> 
> The advantage of slab was largely the binning by type. Everything else was 
> just a big crock. SLUB does the binning better, by really just making the 
> type binning be about what really matters - the *size* of the type.
> 
> So my argument was that the type/size binning makes sense (size more so 
> than type), but the rest of the original Sun arguments for why slab was 
> such a great idea were basically just the crap.
> 
> Hard type binning was a mistake (but needed by slab due to the idiotic 
> notion that constructors/destructors are "good for caches" - bleargh). I 
> suspect that hard size binning is a mistake too (ie there are probably 
> cases where you do want to split unused bigger size areas), but the fact 
> that all of our allocators are two-level (with the page allocator acting 
> as a size-agnostic free space) may help it somewhat.
> 
> And yes, I do agree that any current allocator has problems with the big 
> sizes that don't fit well into a page or two (like task_struct). That 
> said, most of those don't have lots of allocations under many normal 
> circumstances (even if there are uses that will really blow them up).
> 
> The *big* slab users at least for me tend to be ext3_inode_cache and 
> dentry. Everything else is orders of magnitude less. And of the two bad 
> ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting 
> in ~10% fragmentation just due to the page thing, regardless of whether 
> you use an order-0 or order-1 page allocation).
> 
> Of course, dentries fit better in a page (due to being smaller), but then 
> the bigger number of dentries per page make it harder to actually free 
> pages, so then you get fragmentation from that. Oh well. You can't win.

One idea I've been kicking around is pushing the boundary for the buddy
allocator back a bit (to 64k, say) and using SL*B under that. The page
allocators would call into buddy for larger than 64k (rare!) and SL*B
otherwise. This would let us greatly improve our handling of things like
task structs and skbs and possibly also things like 8k stacks and jumbo
frames. As SL*B would never be competing with the page allocator for
contiguous pages (the buddy allocator's granularity would be 64k), I
don't think this would exacerbate the page-level fragmentation issues.

Crazy?

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ