[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc8e0ec8fd7c4222a8ef858d3682b18b@ausx13mps321.AMER.DELL.COM>
Date: Mon, 30 Jul 2018 23:26:02 +0000
From: <Alex_Gagniuc@...lteam.com>
To: <talgi@...lanox.com>, <mr.nuke.me@...il.com>,
<jakub.kicinski@...ronome.com>
Cc: <linux-pci@...r.kernel.org>, <bhelgaas@...gle.com>,
<keith.busch@...el.com>, <Austin.Bolen@...l.com>,
<Shyam.Iyer@...l.com>, <jeffrey.t.kirsher@...el.com>,
<ariel.elior@...ium.com>, <michael.chan@...adcom.com>,
<ganeshgr@...lsio.com>, <tariqt@...lanox.com>, <airlied@...il.com>,
<alexander.deucher@....com>, <mike.marciniszyn@...el.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5] PCI: Check for PCIe downtraining conditions
On 07/24/2018 08:40 AM, Tal Gilboa wrote:
> On 7/24/2018 2:59 AM, Alex G. wrote:
>>
>>
>> On 07/23/2018 05:14 PM, Jakub Kicinski wrote:
>>> On Tue, 24 Jul 2018 00:52:22 +0300, Tal Gilboa wrote:
>>>> On 7/24/2018 12:01 AM, Jakub Kicinski wrote:
>>>>> On Mon, 23 Jul 2018 15:03:38 -0500, Alexandru Gagniuc wrote:
>>>>>> PCIe downtraining happens when both the device and PCIe port are
>>>>>> capable of a larger bus width or higher speed than negotiated.
>>>>>> Downtraining might be indicative of other problems in the system, and
>>>>>> identifying this from userspace is neither intuitive, nor
>>>>>> straightforward.
>>>>>>
>>>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>>>> a perfect solution, but it works extremely well in most cases.
>>>>>>
>>>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@...il.com>
>>>>>> ---
>>>>>>
>>>>>> For the sake of review, I've created a __pcie_print_link_status()
>>>>>> which
>>>>>> takes a 'verbose' argument. If we agree want to go this route, and
>>>>>> update
>>>>>> the users of pcie_print_link_status(), I can split this up in two
>>>>>> patches.
>>>>>> I prefer just printing this information in the core functions, and
>>>>>> letting
>>>>>> drivers not have to worry about this. Though there seems to be
>>>>>> strong for
>>>>>> not going that route, so here it goes:
>>>>>
>>>>> FWIW the networking drivers print PCIe BW because sometimes the network
>>>>> bandwidth is simply over-provisioned on multi port cards, e.g. 80Gbps
>>>>> card on a x8 link.
>>>>>
>>>>> Sorry to bike shed, but currently the networking cards print the info
>>>>> during probe. Would it make sense to move your message closer to probe
>>>>> time? Rather than when device is added. If driver structure is
>>>>> available, we could also consider adding a boolean to struct pci_driver
>>>>> to indicate if driver wants the verbose message? This way we avoid
>>>>> duplicated prints.
>>>>>
>>>>> I have no objection to current patch, it LGTM. Just a thought.
>>>>
>>>> I don't see the reason for having two functions. What's the problem with
>>>> adding the verbose argument to the original function?
>>>
>>> IMHO it's reasonable to keep the default parameter to what 90% of users
>>> want by a means on a wrapper. The non-verbose output is provided by
>>> the core already for all devices.
>>>
>>> What do you think of my proposal above Tal? That would make the extra
>>> wrapper unnecessary since the verbose parameter would be part of the
>>> driver structure, and it would avoid the duplicated output.
>>
>> I see how it might make sense to add another member to the driver
>> struct, but is it worth the extra learning curve? It seems to be
>> something with the potential to confuse new driver developers, and
>> having a very marginal benefit.
>> Although, if that's what people want...
>
> I prefer the wrapper function. Looking at struct pci_driver it would
> seem strange for it to hold a field for controlling verbosity (IMO).
> This is a very (very) specific field in a very general struct.
If people are okay with the wrapper, then I'm not going to update the patch.
Alex
Powered by blists - more mailing lists