linux-kernel - Re: [RFC 2/2] KVM: add virtio-pmem driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1080174355.20804941.1508173474622.JavaMail.zimbra@redhat.com>
Date:   Mon, 16 Oct 2017 13:04:34 -0400 (EDT)
From:   Pankaj Gupta <pagupta@...hat.com>
To:     Stefan Hajnoczi <stefanha@...il.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        qemu-devel@...gnu.org, linux-nvdimm@...1.01.org,
        linux-mm@...ck.org, jack@...e.cz, stefanha@...hat.com,
        dan j williams <dan.j.williams@...el.com>, riel@...hat.com,
        haozhong zhang <haozhong.zhang@...el.com>, nilal@...hat.com,
        kwolf@...hat.com, pbonzini@...hat.com,
        ross zwisler <ross.zwisler@...el.com>, david@...hat.com,
        xiaoguangrong eric <xiaoguangrong.eric@...il.com>
Subject: Re: [RFC 2/2] KVM: add virtio-pmem driver


> 
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
> > > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
> > > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
> > > > +			struct bio *bio)
> > > > +{
> > > > +	blk_status_t rc = 0;
> > > > +	struct bio_vec bvec;
> > > > +	struct bvec_iter iter;
> > > > +	struct virtio_pmem *pmem = q->queuedata;
> > > > +
> > > > +	if (bio->bi_opf & REQ_FLUSH)
> > > > +		//todo host flush command
> > > 
> > > This detail is critical to the device design.  What is the plan?
> > 
> > yes, this is good point.
> > 
> > was thinking of guest sending a flush command to Qemu which
> > will do a fsync on file fd.
> 
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file.  This could perform better in cases where
> only a subset of dirty pages need to be flushed.

yes, We had discussion about this and decided to do entire block flush
then to range level flush.

> 
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being.  Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
> 
> > If we do a async flush and move the task to wait queue till we receive
> > flush complete reply from host we can allow other tasks to execute
> > in current cpu.
> > 
> > Any suggestions you have or anything I am not foreseeing here?
> 
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device.  There is quite a
> bit of overlap between the two.

Exposing options with existing virtio-blk device to be used as persistent memory
range at high level would require additional below features:

- Use a persistent memory range with an option to allocate memmap array in the device
  itself for .

- Block operations for DAX and persistent memory range.

- Bifurcation at filesystem level based on type of virtio-blk device selected.

- Bifurcation of flushing interface and communication channel between guest & host.

But yes these features can be dynamically configured based on type of device
added? What if we have virtio-blk:virtio-pmem (m:n) devices ratio?And scale involved? 

If i understand correctly virtio-blk is high performance interface with multiqueue support 
and additional features at host side like data-plane mode etc. If we bloat it with additional
stuff(even when we need them) and provide locking with additional features both at guest as 
well as host side we will get a hit in performance? Also as requirement of both the interfaces
would grow it will be more difficult to maintain? I would prefer more simpler interfaces with
defined functionality but yes common code can be shared and used using well defined wrappers. 

> 
> Stefan
>