linux-kernel - Re: [PATCH v1 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241007151635.49d8bc30.alex.williamson@redhat.com>
Date: Mon, 7 Oct 2024 15:16:35 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Ankit Agrawal <ankita@...dia.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Yishai Hadas <yishaih@...dia.com>,
 "shameerali.kolothum.thodi@...wei.com"
 <shameerali.kolothum.thodi@...wei.com>, "kevin.tian@...el.com"
 <kevin.tian@...el.com>, Zhi Wang <zhiw@...dia.com>, Aniket Agashe
 <aniketa@...dia.com>, Neo Jia <cjia@...dia.com>, Kirti Wankhede
 <kwankhede@...dia.com>, "Tarun Gupta (SW-GPU)" <targupta@...dia.com>,
 Vikram Sethi <vsethi@...dia.com>, Andy Currid <acurrid@...dia.com>,
 Alistair Popple <apopple@...dia.com>, John Hubbard <jhubbard@...dia.com>,
 Dan Williams <danw@...dia.com>, "Anuj Aggarwal (SW-GPU)"
 <anuaggarwal@...dia.com>, Matt Ochs <mochs@...dia.com>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 0/3] vfio/nvgrace-gpu: Enable grace blackwell boards

On Mon, 7 Oct 2024 16:37:12 +0000
Ankit Agrawal <ankita@...dia.com> wrote:

> >>
> >> NVIDIA's recently introduced Grace Blackwell (GB) Superchip in
> >> continuation with the Grace Hopper (GH) superchip that provides a
> >> cache coherent access to CPU and GPU to each other's memory with
> >> an internal proprietary chip-to-chip (C2C) cache coherent interconnect.
> >> The in-tree nvgrace-gpu driver manages the GH devices. The intention
> >> is to extend the support to the new Grace Blackwell boards.  
> >
> > Where do we stand on QEMU enablement of GH, or the GB support here?
> > IIRC, the nvgrace-gpu variant driver was initially proposed with QEMU
> > being the means through which the community could make use of this
> > driver, but there seem to be a number of pieces missing for that
> > support.  Thanks,
> > 
> > Alex  
> 
> Hi Alex, the Qemu enablement changes for GH is already in Qemu 9.0.
> This is the Generic initiator change that got merged:
> https://lore.kernel.org/all/20240308145525.10886-1-ankita@nvidia.com/
> 
> The missing pieces are actually in the kvm/kernel viz:
> 1. KVM need to map the device memory as Normal. The KVM patch was
> proposed here. This patch need refresh to address the suggestions:
> https://lore.kernel.org/all/20230907181459.18145-2-ankita@nvidia.com/
> 2. ECC handling series for the GPU device memory that is remap_pfn_range()
> mapped: https://lore.kernel.org/all/20231123003513.24292-1-ankita@nvidia.com/
> 
> With those changes, the GH would be functional with the Qemu 9.0.

Sure, unless we note that those series were posted a year ago, which
makes it much harder to claim that we're actively enabling upstream
testing for this driver that we're now trying to extend to new
hardware.  Thanks,

Alex

> We discovered a separate Qemu issue while doing verification of Grace Blackwell,
> where the 512G of highmem proved short here:
> https://github.com/qemu/qemu/blob/v9.0.0/hw/arm/virt.c#L211
> We are planning to have a proposal for the fix floated for that.
> 
> Thanks
> Ankit Agrawal
>