[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <478F9C9C.7070500@qumranet.com>
Date: Thu, 17 Jan 2008 20:21:16 +0200
From: Izik Eidus <izike@...ranet.com>
To: Andrea Arcangeli <andrea@...share.com>
CC: Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, kvm-devel@...ts.sourceforge.net,
Avi Kivity <avi@...ranet.com>, clameter@....com,
daniel.blueman@...drics.com, holt@....com, steiner@....com,
Andrew Morton <akpm@...l.org>, Hugh Dickins <hugh@...itas.com>,
Nick Piggin <npiggin@...e.de>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
andrea@...ranet.com
Subject: Re: [PATCH] mmu notifiers #v2
Andrea Arcangeli wrote:
> On Wed, Jan 16, 2008 at 07:48:06PM +0200, Izik Eidus wrote:
>
>> Rik van Riel wrote:
>>
>>> On Sun, 13 Jan 2008 17:24:18 +0100
>>> Andrea Arcangeli <andrea@...ranet.com> wrote:
>>>
>>>
>>>
>>>> In my basic initial patch I only track the tlb flushes which should be
>>>> the minimum required to have a nice linux-VM controlled swapping
>>>> behavior of the KVM gphysical memory.
>>>>
>>> I have a vaguely related question on KVM swapping.
>>>
>>> Do page accesses inside KVM guests get propagated to the host
>>> OS, so Linux can choose a reasonable page for eviction, or is
>>> the pageout of KVM guest pages essentially random?
>>>
>
> Right, selection of the guest OS pages to swap is partly random but
> wait: _only_ for the long-cached and hot spte entries. It's certainly
> not entirely random.
>
> As the shadow-cache is a bit dynamic, every new instantiated spte will
> refresh the PG_referenced bit in follow_page already (through minor
> faults). not-present fault of swapped non-present sptes, can trigger
> minor faults from swapcache too and they'll refresh young regular
> ptes.
>
>
>> right now when kvm remove pte from the shadow cache, it mark as access the
>> page that this pte pointed to.
>>
>
> Yes: the referenced bit in the mmu-notifier invalidate case isn't
> useful because it's set right before freeing the page.
>
>
>> it was a good solution untill the mmut notifiers beacuse the pages were
>> pinned and couldnt be swapped to disk
>>
>
> It probably still makes sense for sptes removed because of other
> reasons (not mmu notifier invalidates).
>
agree
>
>> so now it will have to do something more sophisticated or at least mark as
>> access every page pointed by pte
>> that get insrted to the shadow cache....
>>
>
> I think that should already be the case, see the mark_page_accessed in
> follow_page, isn't FOLL_TOUCH set, isn't it?
>
yes you are right FOLL_TOUCH is set.
> The only thing we clearly miss is a logic that refreshes the
> PG_referenced bitflag for "hot" sptes that remains instantiated and
> cached for a long time. For regular linux ptes this is done by the cpu
> through the young bitflag. But note that not all architectures have
> the young bitflag support in hardware! So I suppose the swapping of
> the KVM task, is like the swapping any other task but on an alpha
> CPU. It works good enough in practice even if we clearly have room for
> further optimizations in this area (like there would be on archs w/o
> young bit updated in hardware too).
>
> To refresh the PG_referenced bit for long lived hot sptes, I think the
> easiest solution is to chain the sptes in a lru, and to start dropping
> them when memory pressure start. We could drop one spte every X pages
> collected by the VM. So the "age" time factor depends on the VM
> velocity and we totally avoid useless shadow page faults when there's
> no VM pressure. When VM pressure increases, the kvm non-present fault
> will then take care to refresh the PG_referenced bit. This should
> solve the aging-issue for long lived and hot sptes. This should
> improve the responsiveness of the guest OS during "initial" swap
> pressure (after the initial swap pressure, the working set finds
> itself in ram again). So it should avoid some swapout/swapin not
> required jitter during the initial swap. I see this mostly as a kvm
> internal optimization, not strictly related to the mmu notifiers
> though.
>
ohh i like it, this is cleaver solution, and i guess the cost of the
vmexits wont be too high if it will
be not too much aggressive....
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists