lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <952362c1-6886-5293-bee4-267d87a79cba@mellanox.com>
Date:   Mon, 6 Aug 2018 16:01:25 +0300
From:   Eran Ben Elisha <eranbe@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     David Miller <davem@...emloft.net>, saeedm@...lanox.com,
        netdev@...r.kernel.org, jiri@...lanox.com,
        alexander.duyck@...il.com, helgaas@...nel.org
Subject: Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates
 2018-07-31

>>
>> Hi Dave,
>> I would like to re-state that this feature was not meant to be a generic
>> one. This feature was added in order to resolve a HW bug which exist in
>> a small portion of our devices.
> 
> Would you mind describing the HW bug in more detail?  To a outside
> reviewer it really looks like you're adding a feature.  What are you
> working around?  Is the lack of full AQM on the PCIe side of the chip
> considered a bug?

In multiple function environment, there is an issue with buffer 
allocation per function which may lead to starvation. There is an HW WA 
for mitigate this starvation by identifying this state and apply early 
drop/mark.

> 
>> Those params will be used only on those current HWs and won't be in
>> use for our future devices.
> 
> I'm glad that is your plan today, however, customers may get used to
> the simple interface you're adding now.  This means the API you are
> adding is effectively becoming an API other drivers may need to
> implement to keep compatibility with someone's proprietary
> orchestration.

This issue was refactored, thus no need to have this WA at all in future 
NICs. So I don't believe we will end up in the situation you are describing.
It is less likely that other vendors will be facing the same issue and 
will have to support such param. it was burn out of a bug and not as a 
feature which other may follow.

> 
>> During the discussions, several alternatives where offered to be used by
>> various members of the community. These alternatives includes TC and
>> enhancements to PCI configuration tools.
>>
>> Regarding the TC, from my perspective, this is not an option as:
>> 1) The HW mechanism handles multiple functions and therefore cannot be
>> configured on as a regular TC
> 
> Could you elaborate?  What are the multiple functions?  You seem to be
> adding a knob to enable ECN marking and a knob for choosing between
> some predefined slopes.

PSB, The sloped are dynamic and enabled in a dynamic way.
Indeed, we are adding a very specific knob for very non standard 
specific issue which can be used in addition to standard ECN marking.

> 
> In what way would your solution not behave like a RED offload?

Existing Algo (RED, PIE, etc) are static, configurable. Our HW WA is 
dynamic (dynamic slope), adjusted and auto enabled.

> 
> With TC offload you'd also get a well-defined set of statistics, I
> presume right now you're planning on adding a set of ethtool -S
> counters?
> 
>> 2) No PF + representors modeling can be applied here, this is a
>> MultiHost environment where one host is not aware to the other hosts,
>> and each is running on its own pci/driver. It is a device working mode
>> configuration.
> 
> Yes, the multihost part makes it less pleasant.  But this is a problem
> we have to tackle separately, at some point.  It's not a center of
> attention here.

Agree, however the multihost part makes it non-transparent if we chose a 
solution which is not based on direct vendor configuration. This will 
lead to a bad user experience.

> 
>> 3) The current HW W/A is very limited, maybe it has a similar algorithm
>> as WRED, but is being used for much simpler different use case (pci bus
>> congestion).
> 
> No one is requesting full RED offload here..  if someone sets the
> parameters you can't support you simply won't offload them.  And ignore
> the parameters which only make sense in software terms.  Look at the
> docs for mlxsw:
> 
> https://github.com/Mellanox/mlxsw/wiki/Queues-Management#offloading-red
> 
> It says "not offloaded" in a number of places.
> 
>> It cannot be compared to a standard TC capability (RED/WRED), and
>> defining it as a offload fully controlled by the user will be a big
>> misuse.
> 
> It's generally preferable to implement a subset of exiting well defined
> API than create vendor knobs, hence hardly a misuse.

As written above, this is not the case here.

> 
>> (for example, drop rate cannot be configured)
> 
> I don't know what "configuring drop rate" means in case of RED..
> 
>> regarding the PCI config tools, there was a consensus that such tool is
>> not acceptable as it is not a part of the PCI spec.
> 
> As I said, this has nothing to do with PCI being the transport.  The
> port you're running over could be serial, SPI or anything else.  You
> have congestion on a port of a device, that's a networking problem.
> 
>> Since module param/sysfs/debugfs/etc are no longer acceptable, and
>> current drivers still desired with a way to do some configurations to
>> the device/driver which cannot used standard Linux tool or by other
>> vendors, devlink params was developed (under the assumption that this
>> tool will be helpful for those needs, and those only).
>>
>>  From my perspective, Devlink is the tool to configure the device for
>> handling such unexpected bugs, i.e "PCIe buffer congestion handling
>> workaround".
> 
> Hm.  Are you calling it a bug because you had to work around silicon
> limitation in firmware?  Hm.  I'm very intrigued by the framing :)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ