[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40d45c78-d365-f3b4-43bb-c3b5dd6a11ec@amd.com>
Date: Sun, 25 Sep 2022 20:03:27 -0700
From: Sonal Santan <sonal.santan@....com>
To: Frank Rowand <frowand.list@...il.com>, Tom Rix <trix@...hat.com>,
Lizhi Hou <lizhi.hou@....com>, linux-pci@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
robh@...nel.org, helgaas@...nel.org
Cc: clement.leger@...tlin.com, max.zhen@....com, larry.liu@....com,
brian.xu@....com, stefano.stabellini@...inx.com
Subject: Re: [PATCH RFC 0/2] Generate device tree node for pci devices
On 9/19/22 20:12, Frank Rowand wrote:
> On 9/17/22 13:36, Tom Rix wrote:
>> Frank,
>>
>> On 9/16/22 7:23 PM, Frank Rowand wrote:
>>> On 9/13/22 16:02, Lizhi Hou wrote:
>>>> On 9/13/22 10:41, Frank Rowand wrote:
>>>>> On 9/13/22 12:10, Lizhi Hou wrote:
>>>>>> On 9/13/22 00:00, Frank Rowand wrote:
>>>>>>> On 8/29/22 16:43, Lizhi Hou wrote:
>>>>>>>> This patch series introduces OF overlay support for PCI devices which
>>>>>>>> primarily addresses two use cases. First, it provides a data driven method
>>>>>>>> to describe hardware peripherals that are present in a PCI endpoint and
>>>>>>>> hence can be accessed by the PCI host. An example device is Xilinx/AMD
>>>>>>>> Alveo PCIe accelerators. Second, it allows reuse of a OF compatible
>>>>>>>> driver -- often used in SoC platforms -- in a PCI host based system. An
>>>>>>>> example device is Microchip LAN9662 Ethernet Controller.
>>>>>>>>
>>>>>>>> This patch series consolidates previous efforts to define such an
>>>>>>>> infrastructure:
>>>>>>>> https://lore.kernel.org/lkml/20220305052304.726050-1-lizhi.hou@xilinx.com/
>>>>>>>> https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/
>>>>>>>>
>>>>>>>> Normally, the PCI core discovers PCI devices and their BARs using the
>>>>>>>> PCI enumeration process. However, the process does not provide a way to
>>>>>>>> discover the hardware peripherals that are present in a PCI device, and
>>>>>>>> which can be accessed through the PCI BARs. Also, the enumeration process
>>>>>>>> does not provide a way to associate MSI-X vectors of a PCI device with the
>>>>>>>> hardware peripherals that are present in the device. PCI device drivers
>>>>>>>> often use header files to describe the hardware peripherals and their
>>>>>>>> resources as there is no standard data driven way to do so. This patch
>>>>>>>> series proposes to use flattened device tree blob to describe the
>>>>>>>> peripherals in a data driven way. Based on previous discussion, using
>>>>>>>> device tree overlay is the best way to unflatten the blob and populate
>>>>>>>> platform devices. To use device tree overlay, there are three obvious
>>>>>>>> problems that need to be resolved.
>>>>>>>>
>>>>>>>> First, we need to create a base tree for non-DT system such as x86_64. A
>>>>>>>> patch series has been submitted for this:
>>>>>>>> https://lore.kernel.org/lkml/20220624034327.2542112-1-frowand.list@gmail.com/
>>>>>>>> https://lore.kernel.org/lkml/20220216050056.311496-1-lizhi.hou@xilinx.com/
>>>>>>>>
>>>>>>>> Second, a device tree node corresponding to the PCI endpoint is required
>>>>>>>> for overlaying the flattened device tree blob for that PCI endpoint.
>>>>>>>> Because PCI is a self-discoverable bus, a device tree node is usually not
>>>>>>>> created for PCI devices. This series adds support to generate a device
>>>>>>>> tree node for a PCI device which advertises itself using PCI quirks
>>>>>>>> infrastructure.
>>>>>>>>
>>>>>>>> Third, we need to generate device tree nodes for PCI bridges since a child
>>>>>>>> PCI endpoint may choose to have a device tree node created.
>>>>>>>>
>>>>>>>> This patch series is made up of two patches.
>>>>>>>>
>>>>>>>> The first patch is adding OF interface to allocate an OF node. It is copied
>>>>>>>> from:
>>>>>>>> https://lore.kernel.org/lkml/20220620104123.341054-5-clement.leger@bootlin.com/
>>>>>>>>
>>>>>>>> The second patch introduces a kernel option, CONFIG_PCI_OF. When the option
>>>>>>>> is turned on, the kernel will generate device tree nodes for all PCI
>>>>>>>> bridges unconditionally. The patch also shows how to use the PCI quirks
>>>>>>>> infrastructure, DECLARE_PCI_FIXUP_FINAL to generate a device tree node for
>>>>>>>> a device. Specifically, the patch generates a device tree node for Xilinx
>>>>>>>> Alveo U50 PCIe accelerator device. The generated device tree nodes do not
>>>>>>>> have any property. Future patches will add the necessary properties.
>>>>>>>>
>>>>>>>> Clément Léger (1):
>>>>>>>> of: dynamic: add of_node_alloc()
>>>>>>>>
>>>>>>>> Lizhi Hou (1):
>>>>>>>> pci: create device tree node for selected devices
>>>>>>>>
>>>>>>>> drivers/of/dynamic.c | 50 +++++++++++++----
>>>>>>>> drivers/pci/Kconfig | 11 ++++
>>>>>>>> drivers/pci/bus.c | 2 +
>>>>>>>> drivers/pci/msi/irqdomain.c | 6 +-
>>>>>>>> drivers/pci/of.c | 106 ++++++++++++++++++++++++++++++++++++
>>>>>>>> drivers/pci/pci-driver.c | 3 +-
>>>>>>>> drivers/pci/pci.h | 16 ++++++
>>>>>>>> drivers/pci/quirks.c | 11 ++++
>>>>>>>> drivers/pci/remove.c | 1 +
>>>>>>>> include/linux/of.h | 7 +++
>>>>>>>> 10 files changed, 200 insertions(+), 13 deletions(-)
>>>>>>>>
>>>>>>> The patch description leaves out the most important piece of information.
>>>>>>>
>>>>>>> The device located at the PCI endpoint is implemented via FPGA
>>>>>>> - which is programmed after Linux boots (or somewhere late in the boot process)
>>>>>>> - (A) and thus can not be described by static data available pre-boot because
>>>>>>> it is dynamic (and the FPGA program will often change while the Linux
>>>>>>> kernel is already booted
>>>>>>> - (B) can be described by static data available pre-boot because the FPGA
>>>>>>> program will always be the same for this device on this system
>>>>>>>
>>>>>>> I am not positive what part of what I wrote above is correct and would appreciate
>>>>>>> some confirmation of what is correct or incorrect.
>>>>>> There are 2 series devices rely on this patch:
>>>>>>
>>>>>> 1) Xilinx Alveo Accelerator cards (FPGA based device)
>>>>>>
>>>>>> 2) lan9662 PCIe card
>>>>>>
>>>>>> please see: https://lore.kernel.org/lkml/20220427094502.456111-1-clement.leger@bootlin.com/
>>>>> Thanks. Please include this information in future versions of the patch series.
>>>>>
>>>>> For device 2 I have strongly recommended using pre-boot apply of the overlay to the base
>>>>> device tree. I realize that this suggestion is only a partial solution if one wants to
>>>>> use hotplug to change system configuration (as opposed to using hotplug only to replace
>>>>> an existing device (eg a broken device) with another instance of the same device). I
>>>>> also realize that this increased the system administration overhead. On the other hand
>>>>> an overlay based solution is likely to be fragile and possibly flaky.
>>>> Can you clarify the pre-boot apply approach? How will it work for PCI devices?
>>>>>> For Xilinx Alveo device, it is (A). The FPGA partitions can be programmed dynamically after boot.
>>>>> I looked at the Xilinx Alveo web page, and there are a variety of types of Alveo cards
>>>>> available. So the answer to my next question may vary by type of card.
>>>>>
>>>>> Is it expected that the fpga program on a given card will change frequently (eg multiple
>>>>> times per day), where the changed program results in a new device that would require a
>>>>> different hardware description in the device tree?
>>>> Different images may be loaded to a FPGA partition several times a
>>>> day. The PCI topology (Device IDs, BARs, MSIx, etc) does not change.
>>>> New IPs may appear (and old IPs may disappear) on the BARs when a new
>>>> image is loaded. We would like to use flattened device tree to
>>>> describe the IPs on the BARs.
>>> That was kind of a non-answer. I know that images _may_ change at
>>> some frequency. I was trying to get a sense of whether the images
>>> were _likely_ to be changing on a frequent basis for these types
>>> of boards, or whether frequent image changes are likely to be a
>>> rare edge use case.
>>>
>>> If there is a good design for the 99.999% use case that does not
>>> support the 0.001% use case then it may be better to not create
>>> an inferior design that also supports the 0.001% use case.
>>>
>>> I hope that gives a better idea of the reason why I was asking the
>>> question and how the answer could impact design and implementation
>>> decisions.
>>>
>>> As a point of reference, some other fpga users have indicated a
>>> desire to change images many times per second. The current driver
>>> and overlay architecture did not seem to me to be a good match to
>>> that use case (depending on the definition of "many").
>>
>> I would rather we cover 99.999% now.
>>
>> My understanding is that the subdevices are flexible but fairly
>> static and the frequency Lizhi mentions would cover development
>> uses.
>>
>> In production I would expect the image to change about once a year
>> with the same order of magnitude as firmware.
>
> Thanks for this info, it helps a lot.
>
>>
>> Can you point me to a reference of a user case with high frequency
>> images changing that also depends on pci io device changing?
>
> I actually don't have references to any previous PCI devices that are
> based on FPGAs, let alone with a high frequency of images changing.
>
> The Alveo devices are the first such devices that have come to my
> attention. Note that this is a technology space that I do not
> follow, so my lack of awareness does not mean much.
>
> I do not remember the specific discussion that was asserting or
> desiring a high frequency of image changes for an FPGA. The
> current overlay architecture and overall device tree architecture
> would not handle this well and/or robustly because (off the top of
> my head, hopefully I'm getting this correct) the live system device
> tree does not directly contain all of the associated data - some of
> it is contained in the unflattened device tree (FDT) that remains in
> memory after unflattening, both in the case of the base system device
> tree and overlay device trees. Some of the device tree data APIs return
> pointers to this data in the FDT. And the API does not provide reference
> counting for the data (just reference counting for nodes - and these
> reference counts are know to be frequently incorrect).
>
Thanks for pointing out the limitations of the current overlay
architecture. Can a careful orchestration of overlay creation and tear
down by each driver address the limitation? I did see another user,
drivers/pci/hotplug/pnv_php.c, which seems to be using the overlay
infrastructure in this manner.
What is your suggestion to move forward?
-Sonal
> In general I have very little visibility into the FPGA space so I go
> out of my way to notify them before making changes to the overlay
> implementation, API, etc; listen carefully to their input; and give
> them lots of opportunity to test any resulting changes.
>
> -Frank
>
>>
>> Tom
>>
>>> -Frank
>>>
>>>> Thanks,
>>>>
>>>> Lizhi
>>>>
>>>>> Or is the fpga program expected to change on an infrequent basis (eg monthly, quarterly,
>>>>> annually), in the same way as device firmware and operating systems are updated on a regular
>>>>> basis for bug fixes and new functionality?
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Lzhi
>>>>>>
>>>>>>> -Frank
>>
>
Powered by blists - more mailing lists