[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1336287910-12010-1-git-send-email-amirv@mellanox.com>
Date: Sun, 6 May 2012 10:05:08 +0300
From: Amir Vadai <amirv@...lanox.com>
To: "David S. Miller" <davem@...emloft.net>
Cc: netdev@...r.kernel.org,
John Fastabend <john.r.fastabend@...el.com>,
Oren Duer <oren@...lanox.com>,
Liran Liss <liranl@...lanox.com>,
Amir Vadai <amirv@...lanox.com>
Subject: [PATCH net-next 0/2] extend sch_mqprio to distribute traffic not only by ETS TC
This series comes to revive the discussion initiated on the thread "net:
support tx_ring per UP in HW based QoS mechanism" (see
http://marc.info/?t=133165957200004&r=1&w=2) with the major issue to be address
is - how should sk_prio<=> TC be done, for both, tagged and untagged traffic.
Following is a staged description addressing the background, problem
description, current situation, suggestion for the change and implementation of
it.
Background
----------
Egress traffic has 3 layers of management to configure QoS attributes:
* Application - sets sk_prio
* setsockopt() - application may set sk_prio using SO_PRIORITY or IP_TOS
* Host admin - sets sk_prio <=> UP
* net_prio cgroup
* Egress map for tagged traffic
* Net admin - sets UP <=> TC + TC QoS attributes
* lldpad
Commit 4f57c087de9 "net: implement mechanism for HW based QOS" introduced a
mechanism for lower layer devices to steer traffic using skb->priority to tx
queues.
Problem
-------
How should sk_prio <=> TC be done, for both, tagged and untagged traffic?
Current situation
-----------------
* The network priority cgroup infrastructure commit 5bc1421e, introduced implicit
assumption that sk_prio == UP.
* tc tool is used to map UP <=> TC for both tagged and untagged traffic.
* egress map and lldptool and ignored when tc tool is being used.
* HW queue is per TC.
Suggestion
----------
* sk_prio is an attribute controlled by the Application or cgroup.
As used to be in tagged traffic
* tc tool is used by the Host admin and sets sk_prio <=> UP for untagged
traffic. The rest of the chain is UP <=> TC mapped by the Net admin (using
DCBx netlink).
To keep backward compatibility, will have an option to set tc tool to
compatabilty mode, in which, the old sk_prio <=> TC will be kept.
* Depending on HW, queue selection is by UP or by TC.
Implementation
--------------
Extended mqprio hw attribute:
* Bit 1: is queue offset/count owned by HW
* Bits 2-7: HW queueing type.
* 0 - by ETS TC
* 1 - by UP
__skb_tx_hash() is now aware to the HW queuing type (pg_type): for pg_type
being ETS TC, traffic is distributed as it was before - tagged and untagged
packets are distributed by netdev_get_prio_tc_map. For pg_type being UP, tagged
and untagged packets are distributed by UP (taken from egress map for tagged
traffic, or netdev_get_prio_tc_map for untagged).
Amir Vadai (2):
net_sched/mqprio: add support for different pgroup types
net/mlx4_en: num cores tx rings for every UP
drivers/net/ethernet/mellanox/mlx4/en_main.c | 6 ++-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 42 ++++++++++++++++++-----
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 12 -------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 9 ++---
include/linux/netdevice.h | 27 +++++++++++++++
include/linux/pkt_sched.h | 3 +-
net/core/dev.c | 12 +++++--
net/sched/sch_mqprio.c | 11 +++++-
8 files changed, 88 insertions(+), 34 deletions(-)
--
1.7.8.2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists