lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 8 Oct 2018 21:09:46 +0000
From:   <Alex_Gagniuc@...lteam.com>
To:     <helgaas@...nel.org>
Cc:     <mr.nuke.me@...il.com>, <linux-pci@...r.kernel.org>,
        <bhelgaas@...gle.com>, <keith.busch@...el.com>,
        <Austin.Bolen@...l.com>, <Shyam.Iyer@...l.com>,
        <ariel.elior@...ium.com>, <everest-linux-l2@...ium.com>,
        <davem@...emloft.net>, <michael.chan@...adcom.com>,
        <ganeshgr@...lsio.com>, <jeffrey.t.kirsher@...el.com>,
        <tariqt@...lanox.com>, <saeedm@...lanox.com>, <leon@...nel.org>,
        <jakub.kicinski@...ronome.com>, <dirk.vandermerwe@...ronome.com>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <intel-wired-lan@...ts.osuosl.org>, <linux-rdma@...r.kernel.org>,
        <oss-drivers@...ronome.com>, <stephen@...workplumber.org>,
        <mj@....cz>, <alex.williamson@...hat.com>
Subject: Re: [PATCH 1/9] PCI: sysfs: Export available PCIe bandwidth

On 10/04/2018 03:45 PM, Bjorn Helgaas wrote:
> 
> [EXTERNAL EMAIL]
> Please report any suspicious attachments, links, or requests for sensitive information.
> 
> 
> [+cc Alex (VC mentioned below)]
> 
> On Wed, Oct 03, 2018 at 10:00:19PM +0000, Alex_Gagniuc@...lteam.com wrote:
>> On 10/03/2018 04:31 PM, Bjorn Helgaas wrote:
>>> On Mon, Sep 03, 2018 at 01:02:28PM -0500, Alexandru Gagniuc wrote:
>>>> For certain bandwidth-critical devices (e.g. multi-port network cards)
>>>> it is useful to know the available bandwidth to the root complex. This
>>>> information is only available via the system log, which doesn't
>>>> account for link degradation after probing.
>>>>
>>>> With a sysfs attribute, we can computes the bandwidth on-demand, and
>>>> will detect degraded links.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@...il.com>
>>>> ---
>>>>    drivers/pci/pci-sysfs.c | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>>> index 9ecfe13157c0..6658e927b1f5 100644
>>>> --- a/drivers/pci/pci-sysfs.c
>>>> +++ b/drivers/pci/pci-sysfs.c
>>>> @@ -218,6 +218,18 @@ static ssize_t current_link_width_show(struct device *dev,
>>>>    }
>>>>    static DEVICE_ATTR_RO(current_link_width);
>>>>    
>>>> +static ssize_t available_bandwidth_show(struct device *dev,
>>>> +				       struct device_attribute *attr, char *buf)
>>>> +{
>>>> +	struct pci_dev *pci_dev = to_pci_dev(dev);
>>>> +	u32 bw_avail;
>>>> +
>>>> +	bw_avail = pcie_bandwidth_available(pci_dev, NULL, NULL, NULL);
>>>> +
>>>> +	return sprintf(buf, "%u.%03u Gb/s\n", bw_avail / 1000, bw_avail % 1000);
>>>> +}
>>>> +static DEVICE_ATTR_RO(available_bandwidth);
>>>
>>> Help me understand this.  We already have these sysfs attributes:
>>>
>>>     max_link_speed          # eg, 16 GT/s
>>>     max_link_width          # eg, 8
>>>     current_link_speed      # eg, 16 GT/s
>>>     current_link_width      # eg, 8
>>>
>>> so I think the raw materials are already exposed.
>>>
>>> The benefits I see for this new file are that
>>>
>>>     - pcie_bandwidth_available() does the work of traversing up the
>>>       tree, doing the computations (link width * speed, reduced by
>>>       encoding overhead), and finding the minimum, and
>>>
>>>     - it re-traverses the path every time we look at it, while the
>>>       boot-time check is a one-time event.
>>>
>>> In principle this could all be done in user space with the attributes
>>> that are already exported.  There's some precedent for things like
>>> this in lspci, e.g., "NUMA node" [1], and lspci might even be a more
>>> user-friendly place for users to look for this, as opposed to
>>> searching through sysfs.
>>
>> Parsing the endpoint to root port bandwidth is, in principle, possible
>> from userspace. It's just that in practice it's very clumsy to do, and,
>> as you pointed out, not that reliable.
> 
> I don't remember the reliability issue.  Was that in a previous
> conversation?  AFAICT, using current_link_speed and current_link_width
> should be as reliable as pcie_bandwidth_available() because they're
> both based on reading PCI_EXP_LNKSTA.
> 
> This patch exposes only the available bandwidth, not the limiting
> device, link speed, or link width.  Maybe that extra information isn't
> important in this context.  Of course, it's easy to derive using
> current_link_speed and current_link_width, but if we do that, there's
> really no benefit to adding a new file.
> 
> Linux doesn't really support Virtual Channels (VC) yet, and I don't
> know much about it, but it seems like Quality-of-Service features like
> VC could affect this idea of available bandwidth.
> 
> Since we already expose the necessary information, and we might throw
> additional things like VC into the mix, I'm a little hesitant about
> adding things to sysfs because they're very difficult to change later.

I understand your concerns with VC and crazy PCIe outliers. 
'available_bandwidth' is meant to mean physical bandwidth, rather than 
"what comcast gives you". I see now the possibility for confusion.
My motivation is to save drivers from the hassle of having to call 
pcie_print_link_status(), and prevent the rare duplicate syslog messages.

I prefer re-using the kernel logic over doing it over again from 
userspace, but I won't insist over your doubts.

Alex

>> I understand it's not information that all users would jump in the air
>> to know. However, it was important enough for certain use cases, that
>> the kernel already has a very reliable way to calculate it.
>>
>> It seems to me that the most elegant way is to let the kernel tell us,
>> since the kernel already has this facility. To quote one of the texts
>> under Documentation/, it is an elegant way to "avoid reinventing kernel
>> wheels in userspace".
>>
>> Alex
>>
>>> [1] https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/commit/?id=90ec4a6d0ae8
>>>
>>>>    static ssize_t secondary_bus_number_show(struct device *dev,
>>>>    					 struct device_attribute *attr,
>>>>    					 char *buf)
>>>> @@ -786,6 +798,7 @@ static struct attribute *pcie_dev_attrs[] = {
>>>>    	&dev_attr_current_link_width.attr,
>>>>    	&dev_attr_max_link_width.attr,
>>>>    	&dev_attr_max_link_speed.attr,
>>>> +	&dev_attr_available_bandwidth.attr,
>>>>    	NULL,
>>>>    };
>>>>    
>>>> -- 
>>>> 2.17.1
>>>>
>>>
>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ