lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 4 Jun 2019 17:13:03 +0530
From:   Abhishek Sahu <abhsahu@...dia.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
CC:     <linux-kernel@...r.kernel.org>, Lukas Wunner <lukas@...ner.de>,
        <linux-pci@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [PATCH 2/2] PCI: Create device link for NVIDIA GPU



On 6/3/2019 10:52 PM, Bjorn Helgaas wrote:
> [+cc Rafael, just FYI]
> 
> On Mon, Jun 03, 2019 at 01:30:51PM +0530, Abhishek Sahu wrote:
>> On 6/1/2019 2:09 AM, Bjorn Helgaas wrote:
>>> On Fri, May 31, 2019 at 10:31:09AM +0530, Abhishek Sahu wrote:
>>>> NVIDIA Turing GPUs include hardware support for USB Type-C and
>>>> VirtualLink. It helps in delivering the power, display, and data
>>>> required to power VR headsets through a single USB Type-C connector.
>>>> The Turing GPU is a multi-function PCI device has the following
>>>> four functions:
>>>>
>>>> 	- VGA display controller (Function 0)
>>>> 	- Audio controller (Function 1)
>>>> 	- USB xHCI Host controller (Function 2)
>>>> 	- USB Type-C USCI controller (Function 3)
>>>>
>>>> The function 0 is tightly coupled with other functions in the
>>>> hardware. When function 0 goes in runtime suspended state,
>>>> then it will do power gating for most of the hardware blocks.
>>>> Some of these hardware blocks are used by other functions which
>>>> leads to functional failure. So if any of these functions (1/2/3)
>>>> are active, then function 0 should also be in active state.
>>>
>>>> 'commit 07f4f97d7b4b ("vga_switcheroo: Use device link for
>>>> HDA controller")' creates the device link from function 1 to
>>>> function 0. A similar kind of device link needs to be created
>>>> between function 0 and functions 2 and 3 for NVIDIA Turing GPU.
>>>
>>> I can't point to language that addresses this, but this sounds like a
>>> case of the GPU not conforming to the PCI spec.  The general
>>> assumption is that the OS should be able to discover everything it
>>> needs to do power management directly from the architected PCI config
>>> space.
>>
>>  The GPU is following PCIe spec but following is the implementation
>>  from HW side
> 
> Unless you can find spec language that talks about D-state
> dependencies between functions, I claim this is not following the
> PCIe spec.  For example, PCIe r5.0, sec 1.4, says "the PCI/PCIe
> hardware/software model includes architectural constructs necessary to
> discover, configure, and use a Function, without needing Function-
> specific knowledge." Sec 5.1 says "D states are associated with a
> particular Function" and "PM provides ... a mechanism to identify
> power management capabilities of a given Function [and] the ability to
> transition a Function into a certain power management state."
> 

 Thanks Bjorn. Here in case of GPU's these functions are not
 completely independent so it is not following PCIe spec in
 that aspect.

> If there *is* something about dependencies between functions in the
> spec, we should improve the generic PCI core to pay attention to that,
> and then we wouldn't need this quirk.
> 
> If the spec doesn't provide a way to discover them, these dependencies
> are exceptions from the spec, and we have to handle them as hardware
> defects, using quirks like this.  That's fine, but let's not pretend
> that this is a conforming device and that adding quirks is the
> expected process.  Just call a spade a spade and say we're working
> around a defect in this particular device.
> 

 Yes. I am agree with that we need to be very careful in
 adding quirks like this. I will communicate the same to
 HW team so they can explore other options to handle this
 in HW design side for future chips.

> I think the best path forward would be to add this quirk for the
> existing device, and then pursue a spec change to add something like
> a new PCIe capability to describe the dependencies.  Then we could
> enhance the PCI core once and power management for future devices
> would "Just Work" without having to add quirks.
> 

 Yes. It will be long term process. If other HW has similar
 requirement then it would be good to have this.

 Regards,
 Abhishek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ