lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Apr 2020 14:51:55 -0400
From:   Peter Xu <peterx@...hat.com>
To:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     Kevin Tian <kevin.tian@...el.com>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Christophe de Dinechin <dinechin@...hat.com>,
        Yan Zhao <yan.y.zhao@...el.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH v8 00/14] KVM: Dirty ring interface

Hi,

TL;DR: I'm thinking whether we should record pure GPA/GFN instead of (slot_id,
slot_offset) tuple for dirty pages in kvm dirty ring to unbind kvm_dirty_gfn
with memslots.

(A slightly longer version starts...)

The problem is that binding dirty tracking operations to KVM memslots is a
restriction that needs synchronization to memslot changes, which further needs
synchronization across all the vcpus because they're the consumers of memslots.
E.g., when we remove a memory slot, we need to flush all the dirty bits
correctly before we do the removal of the memslot.  That's actually an known
defect for QEMU/KVM [1] (I bet it could be a defect for many other
hypervisors...) right now with current dirty logging.  Meanwhile, even if we
fix it, that procedure is not scale at all, and error prone to dead locks.

Here memory removal is really an (still corner-cased but relatively) important
scenario to think about for dirty logging comparing to memory additions &
movings.  Because memory addition will always have no initial dirty page, and
we don't really move RAM a lot (or do we ever?!) for a general VM use case.

Then I went a step back to think about why we need these dirty bit information
after all if the memslot is going to be removed?

There're two cases:

  - When the memslot is going to be removed forever, then the dirty information
    is indeed meaningless and can be dropped, and,

  - When the memslot is going to be removed but quickly added back with changed
    size, then we need to keep those dirty bits because it's just a commmon way
    to e.g. punch an MMIO hole in an existing RAM region (here I'd confess I
    feel like using "slot_id" to identify memslot is really unfriendly syscall
    design for things like "hole punchings" in the RAM address space...
    However such "punch hold" operation is really needed even for a common
    guest for either system reboots or device hotplugs, etc.).

The real scenario we want to cover for dirty tracking is the 2nd one.

If we can track dirty using raw GPA, the 2nd scenario is solved itself.
Because we know we'll add those memslots back (though it might be with a
different slot ID), then the GPA value will still make sense, which means we
should be able to avoid any kind of synchronization for things like memory
removals, as long as the userspace is aware of that.

With that, when we fetch the dirty bits, we lookup the memslot dynamically,
drop bits if the memslot does not exist on that address (e.g., permanent
removals), and use whatever memslot is there for that guest physical address.
Though we for sure still need to handle memory move, that the userspace needs
to still take care of dirty bit flushing and sync for a memory move, however
that's merely not happening so nothing to take care about either.

Does this makes sense?  Comments greatly welcomed..

Thanks,

[1] https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg08361.html

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ