lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EBAEC4A.6040102@intel.com>
Date:	Wed, 09 Nov 2011 13:10:34 -0800
From:	John Fastabend <john.r.fastabend@...el.com>
To:	Dave Taht <dave.taht@...il.com>
CC:	Neil Horman <nhorman@...driver.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Love, Robert W" <robert.w.love@...el.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: net: Add network priority cgroup

On 11/9/2011 12:27 PM, Dave Taht wrote:
> On Wed, Nov 9, 2011 at 8:57 PM, Neil Horman <nhorman@...driver.com> wrote:
>>
>> Data Center Bridging environments are currently somewhat limited in their
>> ability to provide a general mechanism for controlling traffic priority.
> 
> 
> 
>>
>> Specifically they are unable to administratively control the priority at which
>> various types of network traffic are sent.
>>
>> Currently, the only ways to set the priority of a network buffer are:
>>
>> 1) Through the use of the SO_PRIORITY socket option
>> 2) By using low level hooks, like a tc action
>>
> 2), above is a little vague.
> 
> There are dozens of ways to control the relative priorities of network
> streams in addition to priority notably diffserv, various forms of
> fair queuing, and active queue management tecniques like RED, Blue,
> etc.
> 

Maybe dozens of ways to control traffic using various combinations of
qdiscs but I think for classification we have a small set of reasonably
defined mechanisms.

 - tc filter/action
 - netfilter infrastructure think CLASSIFY (iptables/ebtables)
 - socket options SO_PRIORITY and SO_TOS

By the way setting the tos bits also sets the sk->priority. What other
classifications did I miss?

> The priority field within the Linux skb is used for multiple purposes
> - in addition to SO_PRIORITY it is also used for queue selection
> within tc for a variety of queuing disciplines. Certain bands are
> reserved for vlan and wireless queueing, (these features are rarely
> used)
> 
> Twiddling with it on one level or creating a controller for it can and
> will still be messed up by attempts to sanely use it elsewhere in the
> stack.
> 

The skb->priority is used by some qdiscs and also with vlan egress_maps.

Without knowing the wireless situation it seems you can either not manage
priority over wireless links if this is a problem or perhaps we can clean
up the wireless queueing and integrate it with the appropriate qdisc.

Could the wireless skb->priority usage be tied into mqprio?

>>
>> (1) is difficult from an administrative perspective because it requires that the
>> application to be coded to not just assume the default priority is sufficient,
>> and must expose an administrative interface to allow priority adjustment.  Such
>> a solution is not scalable in a DCB environment
>>
> 
> Nor any other complex environment. Or even a simple one.
> 
>>
>> (2) is also difficult, as it requires constant administrative oversight of
>> applications so as to build appropriate rules to match traffic belonging to
> 
> Yes, your description of option 2, as simplified above, is difficult.
> 
> However certain algorithms are intended to improve fairness between
> flows that do not require as much oversight and classification.
> 
> However, even when RED or a newer queue management algorithm such as
> QFQ or DRR is applied, classes of traffic exist that benefit from more
> specialized diffserv or diffserv-like behavior.
> 
> However, the evidence for something more complex in server
> environments than simple priority management is compelling at this
> point.
> 
>> various classes, so that priority can be appropriately set. It is further
>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>> hw queues for various traffic classes and needs the priority to be set prior to
>> selecting the root qdisc)
>>
> 
> Multiple applications (somewhat) rightly set priorities according to
> their view of the world.
> 
> background traffic and immediate traffic often set the appropriate
> diffserv bits, other traffic can do the same, and at least a few apps
> set the priority field also in the hope that that will do some good,
> and perhaps more should.

These patches do not overwrite existing priorities. So applications
that manage the priority can continue to do this.

> 
> 
>>
>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>> being a good general solution to this problem.  The network priority cgroup
> 
> Not if you are wanting to apply queue management further down the stack!
> 

I don't follow? Here your saying that you have a queue management that the
QOS layer is unaware of? OK so any qdisc or priority mechanism is going to
interfere with 'further down the stack'.

>>
>> allows for a per-interface priority map to be built per cgroup.  Any traffic
>> originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface.
> 
>> This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
> 
> I would like it if the many uses of the priority field were reduced to
> one use per semantic grouping.
> 
> You are adding a controller to something that is already
> ill-controlled and ill-defined, overly overloaded and both under and
> over used, to be managed in userspace by code to designed later, and
> then re-mapped once it exits a vm into another host or hardware queue
> management system which may or may not share similar assumptions.
> 

I don't think its ill-defined or ill-controlled. The priority can be
set by well defined mechanisms. We provide another mechanism to set
the priority without having to modify existing applications and a
mechanism for administrators/tools to set dynamically.

Overloaded perhaps the egress_map is a bit of an overloading of this.
But its existed for a long time.

IMHO hardware queue management systems should be integrated into the
qdisc layer if possible. DCB enabled hardware had similar problems
trying to do hardware queue management without involving the OS and
had to add hacks into select_queue() or hard coded traffic types
into the base drivers to work around this. 'mqprio' and dev support
for traffic classes was my take at a generic mechanism to expose this
to the OS.


> Don't get me wrong, I LIKE the controller idea, but think the priority
> field needs to be un-overloaded first to avoid ill-effects elsewhere
> in the users of the down-stream subsystems.
> 

But doesn't this help the down-stream subsystems as well? The priority
will eventually be pushed down the stack.

>> Tested by John and myself, with good results
> 
> With what?
> 

I tested this with mqprio using the net_prio cgroups to set the priority
and using mqprio to bind hardware queue sets to each priority. Then
I used netperf, ping, and the cg* tools to test I/O.

As a side note I expect you could also use this in conjunction with
the vlan egress_map to push applications onto 802.1Q priorities.

>> Signed-off-by: Neil Horman <nhorman@...driver.com>
>> CC: John Fastabend <john.r.fastabend@...el.com>
>> CC: Robert Love <robert.w.love@...el.com>
>> CC: "David S. Miller" <davem@...emloft.net>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Dave Täht
> SKYPE: davetaht
> 
> http://www.bufferbloat.net

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ