[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1576e986-6e04-63f3-3569-a105493929a6@mellanox.com>
Date: Tue, 22 May 2018 20:01:21 -0500
From: Huy Nguyen <huyn@...lanox.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Saeed Mahameed <saeedm@...lanox.com>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Jiri Pirko <jiri@...nulli.us>,
Or Gerlitz <gerlitz.or@...il.com>,
Parav Pandit <parav@...lanox.com>
Subject: Re: [net-next 1/6] net/dcb: Add dcbnl buffer attribute
Dear Jakub, PSB.
On 5/22/2018 1:32 PM, Jakub Kicinski wrote:
> On Tue, 22 May 2018 10:36:17 -0500, Huy Nguyen wrote:
>> On 5/22/2018 12:20 AM, Jakub Kicinski wrote:
>>> On Mon, 21 May 2018 14:04:57 -0700, Saeed Mahameed wrote:
>>>> From: Huy Nguyen <huyn@...lanox.com>
>>>>
>>>> In this patch, we add dcbnl buffer attribute to allow user
>>>> change the NIC's buffer configuration such as priority
>>>> to buffer mapping and buffer size of individual buffer.
>>>>
>>>> This attribute combined with pfc attribute allows advance user to
>>>> fine tune the qos setting for specific priority queue. For example,
>>>> user can give dedicated buffer for one or more prirorities or user
>>>> can give large buffer to certain priorities.
>>>>
>>>> We present an use case scenario where dcbnl buffer attribute configured
>>>> by advance user helps reduce the latency of messages of different sizes.
>>>>
>>>> Scenarios description:
>>>> On ConnectX-5, we run latency sensitive traffic with
>>>> small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
>>>> traffic with large messages sizes 512KB and 1MB. We group small, medium,
>>>> and large message sizes to their own pfc enables priorities as follow.
>>>> Priorities 1 & 2 (64B, 256B and 1KB)
>>>> Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
>>>> Priorities 5 & 6 (512KB and 1MB)
>>>>
>>>> By default, ConnectX-5 maps all pfc enabled priorities to a single
>>>> lossless fixed buffer size of 50% of total available buffer space. The
>>>> other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
>>>> we create three equal size lossless buffers. Each buffer has 25% of total
>>>> available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
>>>> to lossless buffer mappings are set as follow.
>>>> Priorities 1 & 2 on lossless buffer #1
>>>> Priorities 3 & 4 on lossless buffer #2
>>>> Priorities 5 & 6 on lossless buffer #3
>>>>
>>>> We observe improvements in latency for small and medium message sizes
>>>> as follows. Please note that the large message sizes bandwidth performance is
>>>> reduced but the total bandwidth remains the same.
>>>> 256B message size (42 % latency reduction)
>>>> 4K message size (21% latency reduction)
>>>> 64K message size (16% latency reduction)
>>>>
>>>> Signed-off-by: Huy Nguyen <huyn@...lanox.com>
>>>> Signed-off-by: Saeed Mahameed <saeedm@...lanox.com>
>>> On a cursory look this bares a lot of resemblance to devlink shared
>>> buffer configuration ABI. Did you look into using that?
>>>
>>> Just to be clear devlink shared buffer ABIs don't require representors
>>> and "switchdev mode".
>>> .
>> [HQN] Dear Jakub, there are several reasons that devlink shared buffer
>> ABI cannot be used:
>> 1. The devlink shared buffer ABI is written based on the switch cli
>> which you can find out more
>> from this link https://community.mellanox.com/docs/DOC-2558.
> Devlink API accommodates requirements of simpler (SwitchX2?) and more
> advanced schemes (present in Spectrum). The simpler/basic static
> threshold configurations is exactly what you are doing here, AFAIU.
[HQN] Devlink API is tailored specifically for switch. We don't
configure threshold configuration
explicitly. It is done via PFC. Once PFC is enabled on priority,
threshold is setup based on our
proprietary formula that were tested rigorously for performance.
>> 2. The dcbnl interfaces have been used for QoS settings.
> QoS settings != shared buffer configuration.
[HQN] I think we have different definition about "shared buffer". Please
refer to this below switch cli link.
It explained in detail what is the "shared buffer" in switch means.
Our NIC does not have "shared buffer" supported.
https://community.mellanox.com/docs/DOC-2591
>
>> In NIC, the buffer configuration are tied to priority (ETS PFC).
> Some customers use DCB, a lot (most?) of them don't. I don't think the
> "this is a logical extension of a commonly used API" really stands here.
[HQN] DCBNL are being actively used. The whole point of this patch
is to tie buffer configuration with IEEE's priority and is IEEE's PFC
configuration.
Ambitious future is to have the switch configure the NIC's buffer size
and buffer mapping
via TLV packet and this DCBNL interface. But we won't go too far here.
>
>> The buffer configuration are not tied to port like switch.
> It's tied to a port and TCs, you just have one port but still have 8
> TCs exactly like a switch...
[HQN] No. Our buffer ties to priority not to TCs.
>> 3. Shared buffer, alpha, threshold are switch specific terms.
> IDK how talking about alpha is relevant, it's just one threshold type
> the API supports. As far as shared buffer and threshold I don't know
> if these are switch terms (or how "switch" differs from "NIC" at that
> level) - I personally find carving shared buffer into pools very
> intuitive.
[HQN] Yes, I understand your point too. The NIC's buffer shares some
characteristics with the switch's
buffer settings. But this DCB buffer setting is to improve the
performance and work together with the
PFC setting. We would like to keep all the qos setting under DCB Netlink
as they are designed
to be this way.
>
> Could you give examples of commands/configs one can use with your new
> ABI?
[HQN] The plan is to add the support in lldptool once the kernel code is
accepted. To test the kernel code,
I am using small python scripts that works on top of the netlink library.
It will be like this format which is similar to other options in lldptool
priority2buffer: 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to
buffer 0,2,5,7,1,2,3,6
buffer_size: 87296,87296,0,87296,0,0,0,0 set receive buffer size
for buffer 0,1,2,3,4,5,6,7 respectively
> How does one query the total size of the buffer to be carved?
[HQN] This is not necessary. If the total size is too big, error will be
return via DCB netlink interface.
>
Powered by blists - more mailing lists