[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200716213120.GA648781@bjorn-Precision-5520>
Date: Thu, 16 Jul 2020 16:31:20 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Yicong Yang <yangyicong@...ilicon.com>
Cc: linux-kernel@...r.kernel.org, alexander.shishkin@...ux.intel.com,
linux-pci@...r.kernel.org, linuxarm@...wei.com
Subject: Re: [RFC PATCH] hwtracing: Add HiSilicon PCIe Tune and Trace device
driver
On Thu, Jul 16, 2020 at 05:06:19PM +0800, Yicong Yang wrote:
> On 2020/7/11 7:09, Bjorn Helgaas wrote:
> > On Sat, Jun 13, 2020 at 05:32:13PM +0800, Yicong Yang wrote:
> >> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
> >> integrated Endpoint(RCiEP) device, providing the capability
> >> to dynamically monitor and tune the PCIe traffic parameters(tune),
> >> and trace the TLP headers to the memory(trace).
> >>
> >> Add the driver for the device to enable its functions. The driver
> >> will create debugfs directory for each PTT device, and users can
> >> operate the device through the files under its directory.
> >> +Tune
> >> +====
> >> +
> >> +PTT tune is designed for monitoring and adjusting PCIe link parameters(events).
> >> +Currently we support events 4 classes. The scope of the events
> >> +covers the PCIe core with which the PTT device belongs to.
> > All of these look like things that have the potential to break the
> > PCIe protocol and cause errors like timeouts, receiver overflows, etc.
> > That's OK for a debug/analysis situation, but it should taint the
> > kernel somehow because I don't want to debug problems like that if
> > they're caused by users tweaking things.
> >
> > That might even be a reason *not* to merge the tune side of this. I
> > can see how it might be useful for you internally, but it's not
> > obvious to me how it will benefit other users. Maybe that part should
> > be an out-of-tree module?
>
> All the tuning values are not accurate, but abstracted to several
> _levels_ of each events. The levels are delicately designed to
> guarantee by the hardware that they are always valid and will not
> break the PCIe link. The possible level values exposed to the users
> is tested and safe and other values will not be accepted.
>
> The final tuning events is not settled and we'll not exposed the
> events which will may lead to the link broken. Furthermore, maybe we
> could default disable the tune events' level adjustment and make
> them readonly. The user can enable the full tune function by a
> module parameters or in the BIOS, and a warning message will be
> displayed.
>
> The tune part is beneficial for the users and not only for our
> internal use. We intends to provide a way to tune the link
> depending on the downstream components and link configuration. For
> example, users can tune the data path QoS level to get better
> performance according to the link width is x8 or x16, or according
> to the endpoints' class is a network card or a nvme disk. It will
> make our controller adapt to different condition with high
> performance, so we hope this feature to be merged.
OK. This driver itself is outside my area, so I guess merging it is
up to Alexander.
Do you have any measurements of performance improvements? I think it
would be good to have real numbers showing that this is useful.
You mentioned a warning message, so I assume you'll add some kind of
dmesg logging when tuning happens?
Is this protected so it's only usable by root or other appropriate
privileged user?
> >> + * The PTT can designate function for trace.
> >> + * Add the root port's subordinates in the list as we
> >> + * can specify certain function.
> >> + */
> >> + child_bus = tpdev->subordinate;
> >> + list_for_each_entry(tpdev, &child_bus->devices, bus_list) {
> > *This* looks like a potential problem with hotplug. How do you deal
> > with devices being added/removed after this loop?
>
> Yes. I have considered the add/remove situation but not intend to address it
> in this RFC and assume the topology is static after probing.
> I will manage the situation in next version.
What happens if a device is added or removed after boot? If the only
limitation is that you can't tune or trace a hot-added device, that's
fine. (I mean, it's really *not* fine because it's a poor user
experience, but at least it's just a usability issue, not a crash.)
But if hot-adding or hot-removing a device can cause an oops or a
crash or something, *that* is definitely a problem.
Bjorn
Powered by blists - more mailing lists