netdev - Re: [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 Mar 2012 12:09:30 +0200
From:	Amir Vadai <amirv@...lanox.com>
To:	John Fastabend <john.r.fastabend@...el.com>
CC:	"David S. Miller" <davem@...emloft.net>, <netdev@...r.kernel.org>,
	Roland Dreier <roland@...estorage.com>,
	Oren Duer <oren@...lanox.com>,
	Amir Vadai <amirv@...lanox.co.il>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Eilon Greenstein <eilong@...adcom.com>
Subject: Re: [PATCH 7/8] net: support tx_ring per UP in HW based QoS mechanism

On 03/13/2012 08:23 PM, John Fastabend wrote:
> On 3/13/2012 10:22 AM, Amir Vadai wrote:
>> From: Amir Vadai<amirv@...lanox.co.il>
>>
>> The Current HW based QoS mechanism which was introduced in commit 4f57c087de9
>> "net: implement mechanism for HW based QOS" is in orientation to ETS traffic
>> class. This patch introduces an approach which allow to use this mechanism also
>> with hardware who has queues per user priority (UP). After the change,
>> __skb_tx_hash() will direct a flow to a tx ring from a range of tx rings. This
>> range is defined by the caller function by the specific HW. If TC based queues,
>> the range is by TC number and for UP based queues, the range is by UP.
>>
> ETS is one specific use case for mqprio it can easily be used with other
> hardware transmission selection algorithms 802.1Q std based or otherwise.
>
> The mapping is really just an skb->priority to group of queues (qoffset and
> qcount). I happened to call the queue grouping a traffic class because that
> aligns with 802.1Q but it _really_ is just a queue grouping.
This is good for untagged traffic, but for tagged traffic I can see 2 
problems with this approach:
1. mqprio mapping could contradict egress map or 8021Qaz mapping (UP <=> 
TC). This could be solved (not very elegantly) by forcing the mappings 
to be synced.
2. egress map is per vlan, and mqprio mapping is one global mapping.
>
> In your case what would it mean to change the map and num_tc see 'tc':
>
> [root@...dev1-dcblab netperf]# tc qdisc add dev eth3 root mqprio help
> Usage: ... mqprio [num_tc NUMBER] [map P0 P1 ...]
>                    [queues count1@...set1 count2@...set2 ...] [hw 1|0]
>
> For example setting 'num_tc 8 map 0 1 2 3 0 1 2 3' looks like it might
> not work correctly. Would you end up with an skb tagged with priority
> 4,5,6,7 being sent out UP queues 0,1,2,3? My quess is that won't work
> with PFC or your ETS correctly.
I don't see a problem here. For example, skb tagged with priority 5 is 
mapped to UP 1. And sent through one of the tx rings of UP 1. All the 
rings of UP 1 share the same transmission queue (schedule queue) which 
is controlled by PFC and ETS by the HW. What is the problem here?
>
> In the canonical iSCSI case we put iscsid in a net_prio cgroup to get the
> priority set then use the priority to steer the skb to the correct queue
> groupings. In your case I think you can just fail any num_tc != 8 and keep
> the dflt map 1:1 then this should work. What did I miss? It looks like you
> already fail the num_tc != 8 case so why do we need this change?
>
> At most maybe we need a flag to indicate the mqprio map can't be changed in
> some cases.
What you suggest is that the priority in net_prio cgroup will be the 
User Priority, and not just the skb priority?
And also, for tagged traffic, how could it be forced to be synced with 
egress map?
>
>
>> CC: John Fastabend<john.r.fastabend@...el.com>
>> CC: Jeff Kirsher<jeffrey.t.kirsher@...el.com>
>> CC: Eilon Greenstein<eilong@...adcom.com>
>> Signed-off-by: Amir Vadai<amirv@...lanox.co.il>
>> ---
>>   drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |   11 ++++++++++-
>>   drivers/net/ethernet/mellanox/mlx4/en_tx.c      |    9 +++------
>>   include/linux/netdevice.h                       |   12 +++++++++++-
>>   include/linux/skbuff.h                          |    3 ++-
>>   net/core/dev.c                                  |   10 +---------
>>   5 files changed, 27 insertions(+), 18 deletions(-)
>>
> [...]
>
>>
>>   void bnx2x_set_num_queues(struct bnx2x *bp)
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> index 7a49830..d0d96e3 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> @@ -570,18 +570,15 @@ static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc, struct sk_buff *sk
>>
>>   u16 mlx4_en_select_queue(struct net_device *dev, struct sk_buff *skb)
>>   {
>> -	struct mlx4_en_priv *priv = netdev_priv(dev);
>> -	int up = -1;
>> +	int up = 0;
>>
>>   	if (vlan_tx_tag_present(skb))
>>   		up = (vlan_tx_tag_get(skb)>>  13);
> I was trying to avoid logic like this in select_queue().
Why?
>
> Can we get the same behavior by keeping the egress map and mqprio
> map in sync?
As I said above, if we force egress map to be synced to mqprio mapping, 
we loose it's power - mqprio is global, and egress map is per vlan.
>
>>   	else if (dev->num_tc)
>>   		up = netdev_get_prio_tc_map(dev, skb->priority);
>>
>> -	if (up>= 0)
>> -		return MLX4_EN_NUM_TX_RINGS + up;
>> -
>> -	return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS);
>> +	return __skb_tx_hash(dev, skb, MLX4_EN_NUM_TX_RINGS,
>> +			MLX4_EN_NUM_TX_RINGS + up, 1);
>>   }
>>
> Thanks,
> John
Thanks,
Amir
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html