lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875z6dik1a.fsf@nanos.tec.linutronix.de>
Date:   Tue, 10 Nov 2020 11:27:29 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Raj\, Ashok" <ashok.raj@...el.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        "Tian\, Kevin" <kevin.tian@...el.com>,
        "Jiang\, Dave" <dave.jiang@...el.com>,
        Bjorn Helgaas <helgaas@...nel.org>,
        "vkoul\@kernel.org" <vkoul@...nel.org>,
        "Dey\, Megha" <megha.dey@...el.com>,
        "maz\@kernel.org" <maz@...nel.org>,
        "bhelgaas\@google.com" <bhelgaas@...gle.com>,
        "alex.williamson\@redhat.com" <alex.williamson@...hat.com>,
        "Pan\, Jacob jun" <jacob.jun.pan@...el.com>,
        "Liu\, Yi L" <yi.l.liu@...el.com>,
        "Lu\, Baolu" <baolu.lu@...el.com>,
        "Kumar\, Sanjay K" <sanjay.k.kumar@...el.com>,
        "Luck\, Tony" <tony.luck@...el.com>,
        "kwankhede\@nvidia.com" <kwankhede@...dia.com>,
        "eric.auger\@redhat.com" <eric.auger@...hat.com>,
        "parav\@mellanox.com" <parav@...lanox.com>,
        "rafael\@kernel.org" <rafael@...nel.org>,
        "netanelg\@mellanox.com" <netanelg@...lanox.com>,
        "shahafs\@mellanox.com" <shahafs@...lanox.com>,
        "yan.y.zhao\@linux.intel.com" <yan.y.zhao@...ux.intel.com>,
        "pbonzini\@redhat.com" <pbonzini@...hat.com>,
        "Ortiz\, Samuel" <samuel.ortiz@...el.com>,
        "Hossain\, Mona" <mona.hossain@...el.com>,
        "dmaengine\@vger.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-pci\@vger.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm\@vger.kernel.org" <kvm@...r.kernel.org>,
        Ashok Raj <ashok.raj@...el.com>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection

Ashok,

On Mon, Nov 09 2020 at 21:14, Ashok Raj wrote:
> On Mon, Nov 09, 2020 at 11:42:29PM +0100, Thomas Gleixner wrote:
>> On Mon, Nov 09 2020 at 13:30, Jason Gunthorpe wrote:
> Approach to IMS is more of a phased approach. 
>
> #1 Allow physical device to scale beyond limits of PCIe MSIx
>    Follows current methodology for guest interrupt programming and
>    evolutionary changes rather than drastic.

Trapping MSI[X] writes is there because it allows to hand a device to an
unmodified guest OS and to handle the case where the MSI[X] entries
storage cannot be mapped exclusively to the guest.

But aside of this, it's not required if the storage can be mapped
exclusively, the guest is hypervisor aware and can get a host composed
message via a hypercall. That works for physical functions and SRIOV,
but not for SIOV.

> #2 Long term we should work together on enabling IMS in guest which
>    requires changes in both HW and SW eco-system.
>
> For #1, the immediate need is to find a way to limit guest from using IMS
> due to current limitations. We have couple options.
>
> a) CPUID based method to disallow IMS when running in a guest OS. Limiting
>    use to existing virtual MSIx to guest devices. (Both you/Jason alluded)
> b) We can extend DMAR table to have a flag for opt-out. So in real platform
>    this flag is clear and in guest VMM will ensure vDMAR will have this flag
>    set. Along the lines as Jason alluded, platform level and via ACPI
>    methods. We have similar use for x2apic_optout today.
>
> Think a) is probably more generic.

But incomplete as I explained before. If the VMM does not set the
hypervisor bit in CPUID then the guest OS assumes to run on bare
metal. It needs more than just relying on CPUID.

Aside of that neither Jason nor myself said that IMS cannot be supported
in a guest. PF and VF IMS can and has to be supported. SIOV is a
different story due to the PASID requirement which obviously needs to be
managed host side and needs HW changes.

> From SW improvements:
>
> - Hypercall to retrieve addr/data from host

You need to have that even for the non SIOV case in order to hand in a
full device which has the IMS storage in queue memory.

> Devices such as idxd that do not have these entries on page-boundaries for
> isolation to permit direct programming from GuestOS will continue to use
> trap-emulate as used today.

That's a restriction of that particular hardware.

> Until then, IMS will be restricted to host VMM only, and we can use the
> methods above to prevent IMS in guest and continue to use the legacy
> virtual MSIx.

SIOV IMS.

But as things stand now not even PF/VF pass through are possible. This
might not be an issue for IDXD, but it's an issue in general and this
want's the be thought of _now_ before we put a lot of infrastructure in
to place which needs then to be ripped apart again.

>> The current specification puts massive restrictions on IMS storage which
>> are _not_ allowing to optimize it in a device specific manner as
>> demonstrated in this discussion.
>
> IMS doesn't restrict this optimization, but to allow it requires more
> OS support as you had mentioned.

Right, IMS per se does not put an restriction on it.

The specification and the HW limitations on the remapping unit put that
restriction into place.

OS support is an obvious requirement, but OS support cannot make
the restrictions of HW go away magically.

But again, we need to think about the path forward _now_.

Just slapping some 'works for IDXD' solution into place can severly
restrict the options for going beyond these limitations simply because
we have to support that 'works for IDXD thing' forever.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ