linux-kernel - Re: [PATCH v3 00/21] KVM: Dirty ring interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200109192318.GF36997@xz-x1>
Date:   Thu, 9 Jan 2020 14:23:18 -0500
From:   Peter Xu <peterx@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Alex Williamson <alex.williamson@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Christophe de Dinechin <dinechin@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Yan Zhao <yan.y.zhao@...el.com>,
        Jason Wang <jasowang@...hat.com>,
        Kevin Kevin <kevin.tian@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        Kirti Wankhede <kwankhede@...dia.com>
Subject: Re: [PATCH v3 00/21] KVM: Dirty ring interface

On Thu, Jan 09, 2020 at 02:13:54PM -0500, Michael S. Tsirkin wrote:
> On Thu, Jan 09, 2020 at 12:58:08PM -0500, Peter Xu wrote:
> > On Thu, Jan 09, 2020 at 09:47:11AM -0700, Alex Williamson wrote:
> > > On Thu,  9 Jan 2020 09:57:08 -0500
> > > Peter Xu <peterx@...hat.com> wrote:
> > > 
> > > > Branch is here: https://github.com/xzpeter/linux/tree/kvm-dirty-ring
> > > > (based on kvm/queue)
> > > > 
> > > > Please refer to either the previous cover letters, or documentation
> > > > update in patch 12 for the big picture.  Previous posts:
> > > > 
> > > > V1: https://lore.kernel.org/kvm/20191129213505.18472-1-peterx@redhat.com
> > > > V2: https://lore.kernel.org/kvm/20191221014938.58831-1-peterx@redhat.com
> > > > 
> > > > The major change in V3 is that we dropped the whole waitqueue and the
> > > > global lock. With that, we have clean per-vcpu ring and no default
> > > > ring any more.  The two kvmgt refactoring patches were also included
> > > > to show the dependency of the works.
> > > 
> > > Hi Peter,
> > 
> > Hi, Alex,
> > 
> > > 
> > > Would you recommend this style of interface for vfio dirty page
> > > tracking as well?  This mechanism seems very tuned to sparse page
> > > dirtying, how well does it handle fully dirty, or even significantly
> > > dirty regions?
> > 
> > That's truely the point why I think the dirty bitmap can still be used
> > and should be kept.  IIUC the dirty ring starts from COLO where (1)
> > dirty rate is very low, and (2) sync happens frequently.  That's a
> > perfect ground for dirty ring.  However it for sure does not mean that
> > dirty ring can solve all the issues.  As you said, I believe the full
> > dirty is another extreme in that dirty bitmap could perform better.
> > 
> > > We also don't really have "active" dirty page tracking
> > > in vfio, we simply assume that if a page is pinned or otherwise mapped
> > > that it's dirty, so I think we'd constantly be trying to re-populate
> > > the dirty ring with pages that we've seen the user consume, which
> > > doesn't seem like a good fit versus a bitmap solution.  Thanks,
> > 
> > Right, so I confess I don't know whether dirty ring is the ideal
> > solutioon for vfio either.  Actually if we're tracking by page maps or
> > pinnings, then IMHO it also means that it could be more suitable to
> > use an modified version of dirty ring buffer (as you suggested in the
> > other thread), in that we can track dirty using (addr, len) range
> > rather than a single page address.  That could be hard for KVM because
> > in KVM the page will be mostly trapped in 4K granularity in page
> > faults, and it'll also be hard to merge continuous entries with
> > previous ones because the userspace could be reading the entries (so
> > after we publish the previous 4K dirty page, we should not modify the
> > entry any more).
> 
> An easy way would be to keep a couple of entries around, not pushing
> them into the ring until later.  In fact deferring queue write until
> there's a bunch of data to be pushed is a very handy optimization.

I feel like I proposed similar thing in the other thread. :-)

> 
> When building UAPI's it makes sense to try and keep them generic
> rather than tying them to a given implementation.
> 
> That's one of the reasons I called for using something
> resembling vring_packed_desc.

But again, I just want to make sure I don't over-engineer...

I'll wait for further feedback from others for this.

Thanks,

-- 
Peter Xu