linux-kernel - Re: [PATCH v3 00/21] KVM: Dirty ring interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200109172718-mutt-send-email-mst@kernel.org>
Date:   Thu, 9 Jan 2020 17:28:36 -0500
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Christophe de Dinechin <dinechin@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Yan Zhao <yan.y.zhao@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Kevin Kevin <kevin.tian@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH v3 00/21] KVM: Dirty ring interface

On Thu, Jan 09, 2020 at 02:39:49PM -0500, Peter Xu wrote:
> On Thu, Jan 09, 2020 at 02:08:52PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Jan 09, 2020 at 12:08:49PM -0500, Peter Xu wrote:
> > > On Thu, Jan 09, 2020 at 11:40:23AM -0500, Michael S. Tsirkin wrote:
> > > 
> > > [...]
> > > 
> > > > > > I know it's mostly relevant for huge VMs, but OTOH these
> > > > > > probably use huge pages.
> > > > > 
> > > > > Yes huge VMs could benefit more, especially if the dirty rate is not
> > > > > that high, I believe.  Though, could you elaborate on why huge pages
> > > > > are special here?
> > > > > 
> > > > > Thanks,
> > > > 
> > > > With hugetlbfs there are less bits to test: e.g. with 2M pages a single
> > > > bit set marks 512 pages as dirty.  We do not take advantage of this
> > > > but it looks like a rather obvious optimization.
> > > 
> > > Right, but isn't that the trade-off between granularity of dirty
> > > tracking and how easy it is to collect the dirty bits?  Say, it'll be
> > > merely impossible to migrate 1G-huge-page-backed guests if we track
> > > dirty bits using huge page granularity, since each touch of guest
> > > memory will cause another 1G memory to be transferred even if most of
> > > the content is the same.  2M can be somewhere in the middle, but still
> > > the same write amplify issue exists.
> > >
> > 
> > OK I see I'm unclear.
> > 
> > IIUC at the moment KVM never uses huge pages if any part of the huge page is
> > tracked.
> 
> To be more precise - I think it's per-memslot.  Say, if the memslot is
> dirty tracked, then no huge page on the host on that memslot (even if
> guest used huge page over that).

Yea ... so does it make sense to make this implementation detail
leak through UAPI?

> > But if all parts of the page are written to then huge page
> > is used.
> 
> I'm not sure of this... I think it's still in 4K granularity.
> 
> > 
> > In this situation the whole huge page is dirty and needs to be migrated.
> 
> Note that in QEMU we always migrate pages in 4K for x86, iiuc (please
> refer to ram_save_host_page() in QEMU).
> 
> > 
> > > PS. that seems to be another topic after all besides the dirty ring
> > > series because we need to change our policy first if we want to track
> > > it with huge pages; with that, for dirty ring we can start to leverage
> > > the kvm_dirty_gfn.pad to store the page size with another new kvm cap
> > > when we really want.
> > > 
> > > Thanks,
> > 
> > Seems like leaking implementation detail to UAPI to me.
> 
> I'd say it's not the only place we have an assumption at least (please
> also refer to uffd_msg.pagefault.address).  IMHO it's not something
> wrong because interfaces can be extended, but I am open to extending
> kvm_dirty_gfn to cover a length/size or make the pad larger (as long
> as Paolo is fine with this).
> 
> Thanks,
> 
> -- 
> Peter Xu