[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA93jw7tQtBGFMuDiK_pRUqAdfBVqrwbMegR8gh-8KgVb19PWg@mail.gmail.com>
Date: Mon, 14 Nov 2011 13:32:04 +0100
From: Dave Taht <dave.taht@...il.com>
To: Neil Horman <nhorman@...driver.com>
Cc: netdev@...r.kernel.org,
John Fastabend <john.r.fastabend@...el.com>,
Robert Love <robert.w.love@...el.com>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: net: Add network priority cgroup
On Mon, Nov 14, 2011 at 12:47 PM, Neil Horman <nhorman@...driver.com> wrote:
> On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
>> Data Center Bridging environments are currently somewhat limited in their
>> ability to provide a general mechanism for controlling traffic priority.
>> Specifically they are unable to administratively control the priority at which
>> various types of network traffic are sent.
>>
>> Currently, the only ways to set the priority of a network buffer are:
>>
>> 1) Through the use of the SO_PRIORITY socket option
>> 2) By using low level hooks, like a tc action
>>
>> (1) is difficult from an administrative perspective because it requires that the
>> application to be coded to not just assume the default priority is sufficient,
>> and must expose an administrative interface to allow priority adjustment. Such
>> a solution is not scalable in a DCB environment
>>
>> (2) is also difficult, as it requires constant administrative oversight of
>> applications so as to build appropriate rules to match traffic belonging to
>> various classes, so that priority can be appropriately set. It is further
>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>> hw queues for various traffic classes and needs the priority to be set prior to
>> selecting the root qdisc)
>>
>>
>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>> being a good general solution to this problem. The network priority cgroup
>> allows for a per-interface priority map to be built per cgroup. Any traffic
>> originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface. This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
>>
>> Tested by John and myself, with good results
>>
>> Signed-off-by: Neil Horman <nhorman@...driver.com>
>> CC: John Fastabend <john.r.fastabend@...el.com>
>> CC: Robert Love <robert.w.love@...el.com>
>> CC: "David S. Miller" <davem@...emloft.net>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Bump, any other thoughts here? Dave T. has some reasonable thoughts regarding
> the use of skb->priority, but IMO they really seem orthogonal to the purpose of
> this change. Any other reviews would be welcome.
Well, in part I've been playing catchup in the hope that lldp and
openlldp and/or this dcb netlink layer that I don't know anything
about (pointers please?) could help somehow to resolve the semantic
mess skb->priority has become in the first place.
I liked what was described here.
"What if we did at least carve out the DCB functionality away from
skb->priority? Since, AIUI, we're only concerning ourselves with
locally generated traffic here, we're talking
about skbs that have a socket attached to them. We could, instead of indexing
the prio_tc_map with skb->priority, we could index it with
skb->dev->priomap[skb->sk->prioidx] (as provided by this patch). The cgroup
then could be, instead of a strict priority cgroup, a queue_selector cgroup (or
something more appropriately named), and we don't have to touch skb->priority at
all. I'd really rather not start down that road until I got more opinions and
consensus on that, but it seems like a pretty good solution, one that would
allow hardware queue selection in systems that use things like DCB to co-exist
with software queueing features."
The piece that still kind of bothered me about the original proposal
(and perhaps this one) was that setting SO_PRIORITY in an app means
'give my packets more mojo'.
Taking something that took unprioritized packets and assigned them and
*them only* to a hardware queue struck me as possibly deprioritizing
the 'more mojo wanted' packets in the app(s), as they would end up in
some other, possibly overloaded, hardware queue.
So a cgroup that moves all of the packets from an application into a
given hardware queue, and then gets scheduled normally according to
skb->priority and friends (software queue, default of pfifo_fast,
etc), seems to make some sense to me. (I wouldn't mind if we had
abstractions for software queues, too, like, I need a software queue
with these properties, find me a place for it on the hardware - but
I'm dreaming)
One open question is where do packets generated from other subsystems
end up, if you are using a cgroup for the app? arp, dns, etc?
So to rephrase your original description from this:
>> Any traffic originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface. This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
> John, Robert, if you're supportive of these changes, some Acks would be
> appreciated.
To this:
"Any traffic originating from an application in a cgroup, will have
its hardware queue assigned to the value designated for that group on
that interface. This allows a user space daemon, when conducting LLDP
negotiation with a DCB enabled peer to create a cgroup based on the
APP_TLV value received and administratively assign applications to
that hardware queue using the existing cgroup utility infrastructure."
Assuming we're on the same page here, what the heck is APP_TLV?
> John, Robert, if you're supportive of these changes, some Acks would be
> appreciated.
>
>
> Regards
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists