[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4D6E477C.7050303@redhat.com>
Date: Wed, 02 Mar 2011 15:34:52 +0200
From: Avi Kivity <avi@...hat.com>
To: Marcelo Tosatti <mtosatti@...hat.com>
CC: Alex Williamson <alex.williamson@...hat.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
xiaoguangrong@...fujitsu.com
Subject: Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memory
slots using wbtree
On 03/01/2011 09:47 PM, Marcelo Tosatti wrote:
> On Sun, Feb 27, 2011 at 11:54:29AM +0200, Avi Kivity wrote:
> > On 02/24/2011 07:35 PM, Alex Williamson wrote:
> > >On Thu, 2011-02-24 at 12:06 +0200, Avi Kivity wrote:
> > >> On 02/23/2011 09:28 PM, Alex Williamson wrote:
> > >> > I had forgotten about<1M mem, so actually the slot configuration was:
> > >> >
> > >> > 0:<1M
> > >> > 1: 1M - 3.5G
> > >> > 2: 4G+
> > >> >
> > >> > I stacked the deck in favor of the static array (0: 4G+, 1: 1M-3.5G, 2:
> > >> > <1M), and got these kernbench results:
> > >> >
> > >> > base (stdev) reorder (stdev) wbtree (stdev)
> > >> > --------+-----------------+----------------+----------------+
> > >> > Elapsed | 42.809 (0.19) | 42.160 (0.22) | 42.305 (0.23) |
> > >> > User | 115.709 (0.22) | 114.358 (0.40) | 114.720 (0.31) |
> > >> > System | 41.605 (0.14) | 40.741 (0.22) | 40.924 (0.20) |
> > >> > %cpu | 366.9 (1.45) | 367.4 (1.17) | 367.6 (1.51) |
> > >> > context | 7272.3 (68.6) | 7248.1 (89.7) | 7249.5 (97.8) |
> > >> > sleeps | 14826.2 (110.6) | 14780.7 (86.9) | 14798.5 (63.0) |
> > >> >
> > >> > So, wbtree is only slightly behind reordering, and the standard
> > >> > deviation suggests the runs are mostly within the noise of each other.
> > >> > Thanks,
> > >>
> > >> Doesn't this indicate we should use reordering, instead of a new data
> > >> structure?
> > >
> > >The original problem that brought this on was scaling. The re-ordered
> > >array still has O(N) scaling while the tree should have ~O(logN) (note
> > >that it currently doesn't because it needs a compaction algorithm added
> > >after insert and remove). So yes, it's hard to beat the results of a
> > >test that hammers on the first couple entries of a sorted array, but I
> > >think the tree has better than current performance and more predictable
> > >when scaled performance.
> >
> > Scaling doesn't matter, only actual performance. Even a guest with
> > 512 slots would still hammer only on the first few slots, since
> > these will contain the bulk of memory.
> >
> > >If we knew when we were searching for which type of data, it would
> > >perhaps be nice if we could use a sorted array for guest memory (since
> > >it's nicely bounded into a small number of large chunks), and a tree for
> > >mmio (where we expect the scaling to be a factor). Thanks,
> >
> > We have three types of memory:
> >
> > - RAM - a few large slots
> > - mapped mmio (for device assignment) - possible many small slots
> > - non-mapped mmio (for emulated devices) - no slots
> >
> > The first two are handled in exactly the same way - they're just
> > memory slots. We expect a lot more hits into the RAM slots, since
> > they're much bigger. But by far the majority of faults will be for
> > the third category - mapped memory will be hit once per page, then
> > handled by hardware until Linux memory management does something
> > about the page, which should hopefully be rare (with device
> > assignment, rare == never, since those pages are pinned).
> >
> > Therefore our optimization priorities should be
> > - complete miss into the slot list
> > - hit into the RAM slots
> > - hit into the other slots (trailing far behind)
>
> Whatever ordering considered optimal in one workload can be suboptimal
> in another. The binary search reduces the number of slots inspected
> in the average case. Using slot size as weight favours probability.
It's really difficult to come up with a workload that causes many hits
to small slots.
> > Of course worst-case performance matters. For example, we might
> > (not sure) be searching the list with the mmu spinlock held.
> >
> > I think we still have a bit to go before we can justify the new data
> > structure.
>
> Intensive IDE disk IO on guest with lots of assigned network devices, 3%
> improvement on netperf with rtl8139, 1% improvement on kernbench?
>
> Fail to see justification for not using it.
By itself it's great, but the miss cache will cause the code to be
called very rarely. So I prefer the sorted array which is simpler (and
faster for the few-large-slots case).
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists