lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 9 Nov 2011 16:09:43 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	Dave Taht <dave.taht@...il.com>
Cc:	netdev@...r.kernel.org,
	John Fastabend <john.r.fastabend@...el.com>,
	Robert Love <robert.w.love@...el.com>,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: net: Add network priority cgroup

On Wed, Nov 09, 2011 at 09:27:08PM +0100, Dave Taht wrote:
> On Wed, Nov 9, 2011 at 8:57 PM, Neil Horman <nhorman@...driver.com> wrote:
> >
> > Data Center Bridging environments are currently somewhat limited in their
> > ability to provide a general mechanism for controlling traffic priority.
> 
> 
> 
> >
> > Specifically they are unable to administratively control the priority at which
> > various types of network traffic are sent.
> >
> > Currently, the only ways to set the priority of a network buffer are:
> >
> > 1) Through the use of the SO_PRIORITY socket option
> > 2) By using low level hooks, like a tc action
> >
> 2), above is a little vague.
> 
> There are dozens of ways to control the relative priorities of network
> streams in addition to priority notably diffserv, various forms of
> fair queuing, and active queue management tecniques like RED, Blue,
> etc.
> 
I'm referring explicitly to skb->prioroity here.  Sorry If I wasn't clear.

> The priority field within the Linux skb is used for multiple purposes
> - in addition to SO_PRIORITY it is also used for queue selection
> within tc for a variety of queuing disciplines. Certain bands are
> reserved for vlan and wireless queueing, (these features are rarely
> used)
> 
Yes.

> Twiddling with it on one level or creating a controller for it can and
> will still be messed up by attempts to sanely use it elsewhere in the
> stack.
> 
Why?  Its not like it can't already be twiddled with via SO_PRIORITY.  This does
exactly the same thing, it just lets us do it via an administrative interface
rather than a programatic one.  I don't disagree that the use of skb->prioirty
is complex, but this doesn't add any complexity that isn't already there.  It
just gives us a general way to assign priorities for those that know how to use
it consistently, in a way that doesn't require application modification.  Thats
something that DCB needs.

> >
> > (1) is difficult from an administrative perspective because it requires that the
> > application to be coded to not just assume the default priority is sufficient,
> > and must expose an administrative interface to allow priority adjustment.  Such
> > a solution is not scalable in a DCB environment
> >
> 
> Nor any other complex environment. Or even a simple one.
Yes.

> 
> >
> > (2) is also difficult, as it requires constant administrative oversight of
> > applications so as to build appropriate rules to match traffic belonging to
> 
> Yes, your description of option 2, as simplified above, is difficult.
> 
> However certain algorithms are intended to improve fairness between
> flows that do not require as much oversight and classification.
> 
Yes, but DCB is orthogonal to software traffic control.  Its hardware queueing 
based on the priority value of an skb.  As such, when a DCB enabled multiqueue
adapter selects the output queues in dev_pick_tx, it needs to have the
skb->priority value set properly.  Since we don't run any of the tc filters or
classifiers until after thats complete, we can't use those to adjust the skb
priority, as the root qdisc is already selected.

> However, even when RED or a newer queue management algorithm such as
> QFQ or DRR is applied, classes of traffic exist that benefit from more
> specialized diffserv or diffserv-like behavior.
> 
I understand, but again, DCB is orthogonal to that.  DCB is a hardware based
solution that steers traffic to various output queues in the NIC based on the
skb->priority value.  Take a look at ixgbe_select_queue for an example.

> However, the evidence for something more complex in server
> environments than simple priority management is compelling at this
> point.
> 
> > various classes, so that priority can be appropriately set. It is further
> > limiting when DCB enabled hardware is in use, due to the fact that tc rules are
> > only run after a root qdisc has been selected (DCB enabled hardware may reserve
> > hw queues for various traffic classes and needs the priority to be set prior to
> > selecting the root qdisc)
> >
> 
> Multiple applications (somewhat) rightly set priorities according to
> their view of the world.
> 
> background traffic and immediate traffic often set the appropriate
> diffserv bits, other traffic can do the same, and at least a few apps
> set the priority field also in the hope that that will do some good,
> and perhaps more should.
> 
Agreed, and this patch respects that.  It only sets the priority of an skb that
doesn't already have its priority set.  See skb_update_prio.

> 
> >
> > I've discussed various solutions with John Fastabend, and we saw a cgroup as
> > being a good general solution to this problem.  The network priority cgroup
> 
> Not if you are wanting to apply queue management further down the stack!
> 
I'm not saying you can use the two together! I understand that this solution
interferes with the use of skb->priority in various queuing disciplines (just
like a program using SO_PRIORITY would), but the way those disciplines work is
incompatible with DCB at the moment.  You wouldn't use them all at the same
time.  I'd be happy to add some documentation to my patch to reflect that if you
like.

> >
> > allows for a per-interface priority map to be built per cgroup.  Any traffic
> > originating from an application in a cgroup, that does not explicitly set its
> > priority with SO_PRIORITY will have its priority assigned to the value
> > designated for that group on that interface.
> 
> > This allows a user space daemon,
> > when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
> > based on the APP_TLV value received and administratively assign applications to
> > that priority using the existing cgroup utility infrastructure.
> 
> I would like it if the many uses of the priority field were reduced to
> one use per semantic grouping.
> 
> You are adding a controller to something that is already
> ill-controlled and ill-defined, overly overloaded and both under and
> over used, to be managed in userspace by code to designed later, and
> then re-mapped once it exits a vm into another host or hardware queue
> management system which may or may not share similar assumptions.
> 
> Don't get me wrong, I LIKE the controller idea, but think the priority
> field needs to be un-overloaded first to avoid ill-effects elsewhere
> in the users of the down-stream subsystems.
> 
We can certainly discuss the idea of separating the various semantic uses of
skb->priority out, but I don't think this patch is the place to do it. The
DCB use case for priority already exists (it specifically uses the prio_tc_map
as indexed by skb->priority in __skb_tx_hash).  I'm just adding a means of
controlling it more easily and reliably. 

> > Tested by John and myself, with good results
> 
> With what?
> 
What else?  and ixgbe adapter and ping.  I created a test netprio cgroup, assigned a
priority value to it, and did a did a cgexec -g net_prio:test ping www.yahoo.com
with a printk in the ixgbe tx method to valiedate that the proper queue mapping
was selected.

Neil

> > Signed-off-by: Neil Horman <nhorman@...driver.com>
> > CC: John Fastabend <john.r.fastabend@...el.com>
> > CC: Robert Love <robert.w.love@...el.com>
> > CC: "David S. Miller" <davem@...emloft.net>
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@...r.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Dave Täht
> SKYPE: davetaht
> 
> http://www.bufferbloat.net
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ