lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56CADD6D.2040603@linux.intel.com>
Date:	Mon, 22 Feb 2016 18:05:33 +0800
From:	Xiao Guangrong <guangrong.xiao@...ux.intel.com>
To:	Paolo Bonzini <pbonzini@...hat.com>
Cc:	gleb@...nel.org, mtosatti@...hat.com, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, kai.huang@...ux.intel.com,
	jike.song@...el.com, Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v3 00/11] KVM: x86: track guest page access



On 02/19/2016 08:00 PM, Paolo Bonzini wrote:
>
>
> On 14/02/2016 12:31, Xiao Guangrong wrote:
>> Changelong in v3:
>> - refine the code of mmu_need_write_protect() based on Huang Kai's suggestion
>> - rebase the patchset against current code
>>
>> Changelog in v2:
>> - fix a issue that the track memory of memslot is freed if we only move
>>    the memslot or change the flags of memslot
>> - do not track the gfn which is not mapped in memslots
>> - introduce the nolock APIs at the begin of the patchset
>> - use 'unsigned short' as the track counter to reduce the memory and which
>>    should be enough for shadow page table and KVMGT
>>
>> This patchset introduces the feature which allows us to track page
>> access in guest. Currently, only write access tracking is implemented
>> in this version.
>>
>> Four APIs are introduces:
>> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>>    added into the track pool of the guest instance represented by @kvm,
>>    @mode specifies which kind of access on the @gfn is tracked
>>
>> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>>    of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>>    gfn is no tracked after its last user is gone
>>
>> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>>    the event triggered by page tracking will be received, at that time,
>>    the callback of n->track_write() will be called
>>
>> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>>    of kvm_page_track_register_notifier(), which unlinks the notifier and
>>    stops receiving the tracked event
>>
>> The first user of page track is non-leaf shadow page tables as they are
>> always write protected. It also gains performance improvement because
>> page track speeds up page fault handler for the tracked pages. The
>> performance result of kernel building is as followings:
>>
>>     before           after
>> real 461.63       real 455.48
>> user 4529.55      user 4557.88
>> sys 1995.39       sys 1922.57
>>
>> Furthermore, it is the infrastructure of other kind of shadow page table,
>> such as GPU shadow page table introduced in KVMGT (1) and native nested
>> IOMMU.
>>
>> This patch can be divided into two parts:
>> - patch 1 ~ patch 7, implement page tracking
>> - others patches apply page tracking to non-leaf shadow page table
>
> Xiao,
>
> the patches are very readable and very good.  My comments are only minor.

Thank you, Paolo!

>
> I still have a doubt: how are you going to handle invalidation of GPU
> shadow page tables if a device (emulated in QEMU or even vhost) does DMA
> to the PPGTT?

I think Jike is the better one to answer this question, Jike, could you
please clarify it? :)

> Generally, this was the reason to keep stuff out of KVM
> and instead hook into the kernel mm subsystem (as with userfaultfd).

We considered it carefully but this way can not satisfy KVMGT's requirements.
The reasons i explained in the old thread (https://lkml.org/lkml/2015/12/1/516)
are:

"For the performance, shadow GPU is performance critical and requires
frequently being switched, it is not good to handle it in userspace. And
windows guest has many GPU tables and updates it frequently, that means,
we need to write protect huge number of pages which are single page based,
I am afraid userfaultfd can not handle this case efficiently.

For the functionality, userfaultfd can not fill the need of shadow page
because:
- the page is keeping readonly, userfaultfd can not fix the fault and let
    the vcpu progress (write access causes writeable gup).

- the access need to be emulated, however, userfaultfd/kernel does not have
    the ability to emulate the access as the access is trigged by guest, the
    instruction info is stored in VMCS so that only KVM can emulate it.

- shadow page needs to be notified after the emulation is finished as it
    should know the new data written to the page to update its page hierarchy.
    (some hardwares lack the 'retry' ability so the shadow page table need to
     reflect the table in guest at any time). "

Any idea?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ