[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPcyv4hffSdoONfFohKZzfB2gywGYG9MmDxC0H9+5R53w+ROVQ@mail.gmail.com>
Date: Mon, 16 Oct 2017 08:58:37 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Stefan Hajnoczi <stefanha@...il.com>
Cc: Pankaj Gupta <pagupta@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
KVM list <kvm@...r.kernel.org>,
Qemu Developers <qemu-devel@...gnu.org>,
linux-nvdimm <linux-nvdimm@...1.01.org>,
Linux MM <linux-mm@...ck.org>, Jan Kara <jack@...e.cz>,
Stefan Hajnoczi <stefanha@...hat.com>,
Rik van Riel <riel@...hat.com>,
haozhong zhang <haozhong.zhang@...el.com>,
Nitesh Narayan Lal <nilal@...hat.com>,
Kevin Wolf <kwolf@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>,
ross zwisler <ross.zwisler@...el.com>,
David Hildenbrand <david@...hat.com>,
xiaoguangrong eric <xiaoguangrong.eric@...il.com>
Subject: Re: [RFC 2/2] KVM: add virtio-pmem driver
On Mon, Oct 16, 2017 at 7:47 AM, Stefan Hajnoczi <stefanha@...il.com> wrote:
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
>> > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
>> > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
>> > > + struct bio *bio)
>> > > +{
>> > > + blk_status_t rc = 0;
>> > > + struct bio_vec bvec;
>> > > + struct bvec_iter iter;
>> > > + struct virtio_pmem *pmem = q->queuedata;
>> > > +
>> > > + if (bio->bi_opf & REQ_FLUSH)
>> > > + //todo host flush command
>> >
>> > This detail is critical to the device design. What is the plan?
>>
>> yes, this is good point.
>>
>> was thinking of guest sending a flush command to Qemu which
>> will do a fsync on file fd.
>
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file. This could perform better in cases where
> only a subset of dirty pages need to be flushed.
>
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being. Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
Range based is not a natural storage cache management mechanism. All
that is it available typically is a full write-cache-flush mechanism
and upper layers would need to customized for range-based flushing.
>> If we do a async flush and move the task to wait queue till we receive
>> flush complete reply from host we can allow other tasks to execute
>> in current cpu.
>>
>> Any suggestions you have or anything I am not foreseeing here?
>
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device. There is quite a
> bit of overlap between the two.
I'd be open to that... there's already provisions in the pmem driver
for platforms where cpu caches are flushed on power-loss, a virtio
mode for this shared-memory case seems reasonable.
Powered by blists - more mailing lists