linux-kernel - Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memory slots using wbtree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4D6E46B4.7030909@redhat.com>
Date:	Wed, 02 Mar 2011 15:31:32 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Alex Williamson <alex.williamson@...hat.com>
CC:	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	mtosatti@...hat.com, xiaoguangrong@...fujitsu.com
Subject: Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memory
 slots using wbtree

On 03/01/2011 08:20 PM, Alex Williamson wrote:
> >  >  It seems like we need a good mixed workload benchmark.  So far we've
> >  >  only tested worst case, with a pure emulated I/O test, and best case,
> >  >  with a pure memory test.  Ordering an array only helps the latter, and
> >  >  only barely beats the tree, so I suspect overall performance would be
> >  >  better with a tree.
> >
> >  But if we cache the missed-all-memslots result in the spte, we eliminate
> >  the worst case, and are left with just the best case.
>
> There's potentially a lot of entries between best case and worst case.

The mid case is where we have a lot of small slots which are 
continuously flushed.  That would be (ept=0 && new mappings continuously 
established) || (lots of small mappings && lots of host paging 
activity).  I don't know of any guests that continuously reestablish BAR 
mappings; and host paging activity doesn't apply to device assignment.  
What are we left with?

> >
> >  The problem here is that all workloads will cache all memslots very
> >  quickly into sptes and all lookups will be misses.  There are two cases
> >  where we have lookups that hit the memslots structure: ept=0, and host
> >  swap.  Neither are things we want to optimize too heavily.
>
> Which seems to suggest that:
>
>       A. making those misses fast = win
>       B. making those misses fast + caching misses = win++
>       C. we don't care if the sorted array is subtly faster for ept=0
>
> Sound right?  So is the question whether cached misses alone gets us 99%
> of the improvement since hits are already getting cached in sptes for
> cases we care about?

Yes, that's my feeling.  Caching those misses is a lot more important 
than speeding them up, since the cache will stay valid for long periods, 
and since the hit rate will be very high.

Cache+anything=O(1)
no-cache+tree=O(log(n))
no-cache+array=O(n)

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/