linux-kernel - RE: refactor the i915 GVT support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM4PR11MB5549EC882AA6076F3468274DCAEA9@DM4PR11MB5549.namprd11.prod.outlook.com>
Date:   Wed, 28 Jul 2021 13:38:58 +0000
From:   "Wang, Zhi A" <zhi.a.wang@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Gerd Hoffmann <kraxel@...hat.com>,
        "Greg KH" <gregkh@...uxfoundation.org>
CC:     Christoph Hellwig <hch@....de>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        "Vivi, Rodrigo" <rodrigo.vivi@...el.com>,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        "intel-gfx@...ts.freedesktop.org" <intel-gfx@...ts.freedesktop.org>,
        "intel-gvt-dev@...ts.freedesktop.org" 
        <intel-gvt-dev@...ts.freedesktop.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>
Subject: RE: refactor the i915 GVT support

Hi Jason:

I guess those APIs you were talking about are KVM-only. For other hypervisors, e.g. Xen, ARCN cannot use the APIs you mentioned. Not sure if you have already noticed that VFIO is KVM-only right now.

GVT-g is designed for many hypervisors not only KVM. In the design, we implemented an abstraction layer for different hypervisors. You can check the link in the previous email which has an example of how the MPT module "xengt" supports GVT-g running under Xen. 
For example, injecting a msi in VFIO/KVM is via playing with eventfd. But in Xen, we need to issue a hypercall from Dom0. So does others, like querying mappings between GFN and HFN. Some GPU related emulation logic might be implemented differently under different hypervisors because different hypervisors might provide not exact the APIs we want. That's the reason why they get a callback in the MPT (yet not perfect.)  

As you can see, to survive from this situation, we have to rely on an abstraction layer so that we can prevent introducing coding blocks like in the core logic:

If (in_hypervisor_xen)
	Issue hypercalls
else if (in_hypervisor_kvm)
	Play with eventfds.
Else if (in_hypervisor_other)
	xxxx

Thus some of the APIs have to be wrapped in the MPT module in GVT-g design.

Sadly, not all customers are motivated or allowed to get their hypervisor-specific modules into the kernel. We have a customer who runs GVT-g with their private hypervisor. In this case, they don't want to get their "xxxgt" MPT module into upstream since their hypervisor has been in the kernel yet. Also, we have customers who ported the GVT-g to QNX which is another widely used commercial hypervisor in the industry. They can't get the "qnxgt" MPT module into upstream since it's not allowed.

We do understand the situation and try to figure out a solution that can fulfill expectations from different people in the community and also customers. 

According to Greg KH's comments, we are collecting the requirements of MPT modules from other open-source hypervisors in the kernel, e.g. ACRN, to see if they can get another MPT module into the kernel, which will show an example that how the MPT abstraction can benefit. Also, we are evaluating the impact on our customers if we have to remove MPT abstraction in the kernel because there is only one MPT module. 

Thanks so much for the comments.

Thanks,
Zhi.

-----Original Message-----
From: Jason Gunthorpe <jgg@...dia.com> 
Sent: Tuesday, July 27, 2021 3:12 PM
To: Gerd Hoffmann <kraxel@...hat.com>
Cc: Wang, Zhi A <zhi.a.wang@...el.com>; Christoph Hellwig <hch@....de>; Jani Nikula <jani.nikula@...ux.intel.com>; Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>; Vivi, Rodrigo <rodrigo.vivi@...el.com>; Zhenyu Wang <zhenyuw@...ux.intel.com>; intel-gfx@...ts.freedesktop.org; intel-gvt-dev@...ts.freedesktop.org; linux-kernel@...r.kernel.org; dri-devel@...ts.freedesktop.org
Subject: Re: refactor the i915 GVT support

On Thu, Jul 22, 2021 at 01:26:36PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > https://github.com/intel/gvt-linux/blob/topic/gvt-xengt/drivers/gpu/
> > drm/i915/gvt/xengt.c
> 
> > But it's hard for some customers to contribute their own "hypervisor"
> > module to the upstream Linux kernel. I am thinking what would be a 
> > better solution here? The MPT layer in the kernel helps a lot for 
> > customers, but only one open-source "hypervisor" module is there in 
> > the kernel. That can confuse people which don't know the story.  One 
> > thing I was thinking is to put a document about the background and 
> > more description in the MPT headers. So it won't confuse more people.
> 
> Getting the xengt module linked above merged into mainline would also 
> nicely explain why there are hypervisor modules.

It would also be nice to explain why a GPU driver needs a hypervisor specific shim like this in the first place.

        enum hypervisor_type type;
        int (*host_init)(struct device *dev, void *gvt, const void *ops);
        void (*host_exit)(struct device *dev, void *gvt);
        int (*attach_vgpu)(void *vgpu, unsigned long *handle);
        void (*detach_vgpu)(void *vgpu);

Doesn't vfio provide all this generically with notifiers?

        int (*inject_msi)(unsigned long handle, u32 addr, u16 data);

Isn't this one just an eventfd?

        unsigned long (*from_virt_to_mfn)(void *p);
        int (*read_gpa)(unsigned long handle, unsigned long gpa, void *buf,
                        unsigned long len);
        int (*write_gpa)(unsigned long handle, unsigned long gpa, void *buf,
                         unsigned long len);
        unsigned long (*gfn_to_mfn)(unsigned long handle, unsigned long gfn);

        int (*dma_map_guest_page)(unsigned long handle, unsigned long gfn,
                                  unsigned long size, dma_addr_t *dma_addr);
        void (*dma_unmap_guest_page)(unsigned long handle, dma_addr_t dma_addr);

        int (*dma_pin_guest_page)(unsigned long handle, dma_addr_t dma_addr);

        int (*map_gfn_to_mfn)(unsigned long handle, unsigned long gfn,
                              unsigned long mfn, unsigned int nr, bool map);
        bool (*is_valid_gfn)(unsigned long handle, unsigned long gfn);

Shouldn't the vfio page SW IOMMU do all of this generically?

        int (*enable_page_track)(unsigned long handle, u64 gfn);
        int (*disable_page_track)(unsigned long handle, u64 gfn);
        int (*set_trap_area)(unsigned long handle, u64 start, u64 end,
                             bool map);
        int (*set_opregion)(void *vgpu);
        int (*set_edid)(void *vgpu, int port_num);

edid depends on hypervisor??

        int (*get_vfio_device)(void *vgpu);
        void (*put_vfio_device)(void *vgpu);

Jason