[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190722105630.GC3035@work-vm>
Date: Mon, 22 Jul 2019 11:56:30 +0100
From: "Dr. David Alan Gilbert" <dgilbert@...hat.com>
To: Christian Borntraeger <borntraeger@...ibm.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Vivek Goyal <vgoyal@...hat.com>,
Halil Pasic <pasic@...ux.ibm.com>,
Collin Walling <walling@...ux.ibm.com>,
Cornelia Huck <cohuck@...hat.com>,
Sebastian Ott <sebott@...ux.ibm.com>,
KVM list <kvm@...r.kernel.org>,
Miklos Szeredi <miklos@...redi.hu>,
linux-nvdimm <linux-nvdimm@...ts.01.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Stefan Hajnoczi <stefanha@...hat.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Steven Whitehouse <swhiteho@...hat.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device
* Christian Borntraeger (borntraeger@...ibm.com) wrote:
>
>
> On 18.07.19 16:30, Dan Williams wrote:
> > On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal <vgoyal@...hat.com> wrote:
> >>
> >> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote:
> >>> On Wed, 15 May 2019 15:27:03 -0400
> >>> Vivek Goyal <vgoyal@...hat.com> wrote:
> >>>
> >>>> From: Stefan Hajnoczi <stefanha@...hat.com>
> >>>>
> >>>> Setup a dax device.
> >>>>
> >>>> Use the shm capability to find the cache entry and map it.
> >>>>
> >>>> The DAX window is accessed by the fs/dax.c infrastructure and must have
> >>>> struct pages (at least on x86). Use devm_memremap_pages() to map the
> >>>> DAX window PCI BAR and allocate struct page.
> >>>>
> >>>
> >>> Sorry for being this late. I don't see any more recent version so I will
> >>> comment here.
> >>>
> >>> I'm trying to figure out how is this supposed to work on s390. My concern
> >>> is, that on s390 PCI memory needs to be accessed by special
> >>> instructions. This is taken care of by the stuff defined in
> >>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses
> >>> the appropriate s390 instruction. However if the code does not use the
> >>> linux abstractions for accessing PCI memory, but assumes it can be
> >>> accessed like RAM, we have a problem.
> >>>
> >>> Looking at this patch, it seems to me, that we might end up with exactly
> >>> the case described. For example AFAICT copy_to_iter() (3) resolves to
> >>> the function in lib/iov_iter.c which does not seem to cater for s390
> >>> oddities.
> >>>
> >>> I didn't have the time to investigate this properly, and since virtio-fs
> >>> is virtual, we may be able to get around what is otherwise a
> >>> limitation on s390. My understanding of these areas is admittedly
> >>> shallow, and since I'm not sure I'll have much more time to
> >>> invest in the near future I decided to raise concern.
> >>>
> >>> Any opinions?
> >>
> >> Hi Halil,
> >>
> >> I don't understand s390 and how PCI works there as well. Is there any
> >> other transport we can use there to map IO memory directly and access
> >> using DAX?
> >>
> >> BTW, is DAX supported for s390.
> >>
> >> I am also hoping somebody who knows better can chip in. Till that time,
> >> we could still use virtio-fs on s390 without DAX.
> >
> > s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED.
> > In practice that means that support for PTE_DEVMAP is missing which
> > means no get_user_pages() support for dax mappings. Effectively it's
> > only useful for execute-in-place as operations like fork() and ptrace
> > of dax mappings will fail.
>
>
> This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c
> and arch/s390/mm/extmem.c).
>
> For what its worth, the dcssblk looks to Linux like normal memory (just above the
> previously detected memory) that can be used like normal memory. In previous time
> we even had struct pages for this memory - this was removed long ago (when it was
> still xip) to reduce the memory footprint for large dcss blocks and small memory
> guests.
> Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory?
>
> Now some observations:
> - dcssblk is z/VM only (not KVM)
> - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether
> a device driver is compiled in or not seems not flexible enough in case if you
> have device driver that does have struct pages and another one that doesn't
> - I do not see a reason why we should not be able to map anything from QEMU
> into the guest real memory via an additional KVM memory slot.
> We would need to handle that in the guest somehow (and not as a PCI bar),
> register this with struct pages etc.
> - we must then look how we can create the link between the guest memory and the
> virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or
> whatever. Maybe we could also piggy-back on some memory hotplug work from David
> Hildenbrand (add cc).
>
> Regarding limitations on the platform:
> - while we do have PCI, the virtio devices are usually plugged via the ccw bus.
> That implies no PCI bars. I assume you use those PCI bars only to implicitely
> have the location of the shared memory
> Correct?
Right.
> - no real memory mapped I/O. Instead there are instructions that work on the mmio.
> As I understand things, this is of no concern regarding virtio-fs as you do not
> need mmio in the sense that a memory access of the guest to such an address
> triggers an exit. You just need the shared memory as a mean to have the data
> inside the guest. Any notification is done via normal virtqueue mechanisms
> Correct?
Yep.
>
> Adding Heiko, maybe he remembers some details of the dcssblk/xip history.
>
--
Dr. David Alan Gilbert / dgilbert@...hat.com / Manchester, UK
Powered by blists - more mailing lists