[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1185345850.8197.64.camel@twins>
Date: Wed, 25 Jul 2007 08:44:10 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc: Andi Kleen <andi@...stfloor.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
Hugh Dickins <hugh@...itas.com>
Subject: Re: [patch] mm: reduce pagetable-freeing latencies
On Wed, 2007-07-25 at 07:29 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2007-07-24 at 14:13 +0200, Andi Kleen wrote:
> > Benjamin Herrenschmidt <benh@...nel.crashing.org> writes:
> >
> > > > What a truly putrid patch. I am suspecting that this was a quick
> > > > get-you-out-of-trouble thing, which then got forgotten about.
> > > >
> > > > We have two months to do the "right fix". Please?
> > >
> > > Working on it...
> >
> > Ideally the patch would DTRT even on non preemptible kernels,
> > aka do cond_resched()s when needed.
>
> First is to rework the batch structure to make it more manageable. That
> is, patch #1 will keep the page list in per-cpu (and thus non-preempt),
> but the batch "head" will be on the stack.
>
> Now, there are two approaches regarding getting rid of the
> get_cpu/put_cpu:
>
> - One is to have a small number of entries for the page list in the
> batch structure on the stack, and attempt to gfp' a page for more. If
> that fails, we can still free, though with less batching, using only the
> few entries in the batch struct itself. That's Hugh initial appraoch
> iirc.
>
> - Another is to hook up with those folks who've been asking for a
> notifier that we are being preempted/scheduled out. In this case, I can
> happily access the per-cpu list, and just trigger a batch flush if we
> happen to be scheduled out.
>
> I tend to prefer the former solution though, gfp should be fast, and
> there is no need to force a flush if we get scheduled out. It would be
> rare to hit the worst case scenario of falling back to the few page
> heads in the batch itself. On the other hand, that solution has the
> problem of bloating the stack a bit (with the few page pointers) even in
> the case where I plan to use the extended batch outside of zap_*, such
> as fork, mprotect, ....
>
> So I'll first do patch #1, which will not fix the problem, but will make
> the fix easier to fit in, in the meantime, please provide feedback of
> your preferred solution for avoiding the get/put_cpu of the 2 above,
> unless you find a good 3rd one.
I too would prefer the former solution. I think preemption notifiers are
a particular iffy hack.
You could perhaps use C99 variable length arrays to avoid the stack
waste when not needed, however Andi once told me that generates rather
dubious code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists