[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20110127162626.8b38145b.akpm@linux-foundation.org>
Date: Thu, 27 Jan 2011 16:26:26 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Tim Chen <tim.c.chen@...ux.intel.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] mm: Make vm_acct_memory scalable for large memory
allocations
On Thu, 27 Jan 2011 16:15:05 -0800
Andi Kleen <ak@...ux.intel.com> wrote:
>
> > This seems like a pretty dumb test case. We have 64 cores sitting in a
> > loop "allocating" 32MB of memory, not actually using that memory and
> > then freeing it up again.
> >
> > Any not-completely-insane application would actually _use_ the memory.
> > Which involves pagefaults, page allocations and much memory traffic
> > modifying the page contents.
> >
> > Do we actually care?
>
> It's a bit like a poorly tuned malloc. From what I heard poorly tuned
> mallocs are quite
> common in the field, also with lots of custom ones around.
>
> While it would be good to tune them better the kernel should also have
> reasonable performance
> for this case.
>
> The poorly tuned malloc has other problems too, but this addresses at
> least one
> of them.
>
> Also I think Tim's patch is a general improvement to a somewhat dumb
> code path.
>
I guess another approach to this would be change the way in which we
decide to update the central counter.
At present we'll spill the per-cpu counter into the central counter
when the per-cpu counter exceeds some fixed threshold. But that's
dumb, because the error factor is relatively large for small values of
the counter, and relatively small for large values of the counter.
So instead, we should spill the per-cpu counter into the central
counter when the per-cpu counter exceeds some proportion of the central
counter (eg, 1%?). That way the inaccuracy is largely independent of
the counter value and the lock-taking frequency decreases for large
counter values.
And given that "large cpu count" and "lots of memory" correlate pretty
well, I suspect such a change would fix up the contention which is
being seen here without magical startup-time tuning heuristics.
This again will require moving the batch threshold into the counter
itself and also recalculating it when the central counter is updated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists