[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090414171242.GA4241@elte.hu>
Date: Tue, 14 Apr 2009 19:12:42 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Christoph Lameter <cl@...ux.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com,
Paul Mundt <lethal@...ux-sh.org>, rmk@....linux.org.uk,
starvik@...s.com, ralf@...ux-mips.org, davem@...emloft.net,
cooloney@...nel.org, kyle@...artin.ca, matthew@....cx,
grundler@...isc-linux.org, takata@...ux-m32r.org,
benh@...nel.crashing.org, rth@...ddle.net,
ink@...assic.park.msu.ru, heiko.carstens@...ibm.com,
Nick Piggin <npiggin@...e.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH UPDATED] percpu: use dynamic percpu allocator as the
default percpu allocator
* Christoph Lameter <cl@...ux.com> wrote:
> On Tue, 14 Apr 2009, Ingo Molnar wrote:
>
> > The thing is, i spent well in excess of an hour analyzing your
> > patch, counting cachelines, looking at effects and interactions,
> > thinking about the various implications. I came up with a good deal
> > of factoids, a handful of suggestions and a few summary paragraphs:
> >
> > http://marc.info/?l=linux-kernel&m=123862536011780&w=2
>
> Good work.
>
> > A proper reply to that work would be one of several responses:
> >
>
> ...
>
> >
> > 3) agree with the factoids and disagree with my opinion.
>
> Yep I thought that what I did...
Ok, thanks ... since i never saw a reply from you on that
mail so i couldnt assume you did so.
There's really 3 key observations in that mail - let me sum
them up in order of importance.
1)
I'm wondering what your take on the bss+data suggestion is.
To me it appears it's tempting to merge them into a single
per .o section: it clearly wins us locality of reference.
It seems so obvious to do to me on a modern SMP kernel - has
anyone tried that in the past?
Instead of:
.data1 .data2 .data3 .... .bss1 .bss2 .bss3
we'd have:
.databss1 .databss2 .databss3
This is clearly better compressed, and the layout is easier
to control in the .c file. We could also do tricks to
further compress data here: we could put variables right
after their __aligned__ locks - while currently they are in
the .bss wasting a full cache-line.
In the example i analyzed it would reduce the cache
footprint by one cacheline. This would apply to most .o's so
the combined effect on cache locality would be significant.
[ Another (sub-)advantage would be that it 'linearizes' and
hence properly colors the per .o module variable layout.
With an artificially split .data1 .bss1 the offset between
them is random, and it's harder to control the cache port
positions of closely related variables. ]
2)
Aligning (the now merged) data+bss per .o section on
cacheline boundary [up to 64 byte cacheline sizes or so]
sounds tempting as well - it eliminates accidental "tail
bites the next head" type of cross-object-file interactions.
The price is an estimated 3% blow-up in combined .data+bss
size. A suspect a patch and measurements would settle this
pretty neatly.
3)
The free_percpu() collateral-damage argument i made was
pretty speculative (and artificial as well - the allocation
of percpu resources is very global in nature so a
kfree(NULL)-alike fastpath is harder to imagine) - i tried
at all costs demonstrate my point based on that narrow
example alone.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists