[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DBFB1B45AF80394ABD1C807E9F28D157077D7D8E83@BLRX7MCDC203.AMER.DELL.COM>
Date: Mon, 14 Nov 2011 12:02:59 -0800
From: <Shyam_Iyer@...l.com>
To: <nhorman@...driver.com>
CC: <dave.taht@...il.com>, <netdev@...r.kernel.org>,
<john.r.fastabend@...el.com>, <robert.w.love@...el.com>,
<davem@...emloft.net>
Subject: RE: net: Add network priority cgroup
> -----Original Message-----
> From: netdev-owner@...r.kernel.org [mailto:netdev-
> owner@...r.kernel.org] On Behalf Of Neil Horman
> Sent: Monday, November 14, 2011 12:24 PM
> Subject: Re: net: Add network priority cgroup
> > >
> > > On Mon, Nov 14, 2011 at 01:32:04PM +0100, Dave Taht wrote:
> > > > On Mon, Nov 14, 2011 at 12:47 PM, Neil Horman
> <nhorman@...driver.com>
> > > wrote:
> > > > > On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
> > > > >> Data Center Bridging environments are currently somewhat
> limited
> > > in their
> > > > >> ability to provide a general mechanism for controlling traffic
> > > priority.
> > > > >> Specifically they are unable to administratively control the
> > > priority at which
> > > > >> various types of network traffic are sent.
> > > > >>
> > > > >> Currently, the only ways to set the priority of a network
> buffer
> > > are:
> > > > >>
> > > > >> 1) Through the use of the SO_PRIORITY socket option
> > > > >> 2) By using low level hooks, like a tc action
> > > > >>
> > > > >> (1) is difficult from an administrative perspective because it
> > > requires that the
> > > > >> application to be coded to not just assume the default
> priority is
> > > sufficient,
> > > > >> and must expose an administrative interface to allow priority
> > > adjustment. Such
> > > > >> a solution is not scalable in a DCB environment
> > > > >>
> > > > >> (2) is also difficult, as it requires constant administrative
> > > oversight of
> > > > >> applications so as to build appropriate rules to match traffic
> > > belonging to
> > > > >> various classes, so that priority can be appropriately set. It
> is
> > > further
> > > > >> limiting when DCB enabled hardware is in use, due to the fact
> that
> > > tc rules are
> > > > >> only run after a root qdisc has been selected (DCB enabled
> > > hardware may reserve
> > > > >> hw queues for various traffic classes and needs the priority
> to be
> > > set prior to
> > > > >> selecting the root qdisc)
> > > > >>
> > > > >>
> > > > >> I've discussed various solutions with John Fastabend, and we
> saw a
> > > cgroup as
> > > > >> being a good general solution to this problem. The network
> > > priority cgroup
> > > > >> allows for a per-interface priority map to be built per
> cgroup.
> > > Any traffic
> > > > >> originating from an application in a cgroup, that does not
> > > explicitly set its
> > > > >> priority with SO_PRIORITY will have its priority assigned to
> the
> > > value
> > > > >> designated for that group on that interface. This allows a
> user
> > > space daemon,
> > > > >> when conducting LLDP negotiation with a DCB enabled peer to
> create
> > > a cgroup
> > > > >> based on the APP_TLV value received and administratively
> assign
> > > applications to
> > > > >> that priority using the existing cgroup utility
> infrastructure.
> > > > >>
> > > > >> Tested by John and myself, with good results
> > > > >>
> > > > >> Signed-off-by: Neil Horman <nhorman@...driver.com>
> > > > >> CC: John Fastabend <john.r.fastabend@...el.com>
> > > > >> CC: Robert Love <robert.w.love@...el.com>
> > > > >> CC: "David S. Miller" <davem@...emloft.net>
> > > > >> --
> > > > >> To unsubscribe from this list: send the line "unsubscribe
> netdev"
> > > in
> > > > >> the body of a message to majordomo@...r.kernel.org
> > > > >> More majordomo info at http://vger.kernel.org/majordomo-
> info.html
> > > > >>
> > > > >
> > > > > Bump, any other thoughts here? Dave T. has some reasonable
> > > thoughts regarding
> > > > > the use of skb->priority, but IMO they really seem orthogonal
> to
> > > the purpose of
> > > > > this change. Any other reviews would be welcome.
> > > >
> > > > Well, in part I've been playing catchup in the hope that lldp and
> > > > openlldp and/or this dcb netlink layer that I don't know anything
> > > > about (pointers please?) could help somehow to resolve the
> semantic
> > > > mess skb->priority has become in the first place.
> > > >
> > > > I liked what was described here.
> > > >
> > > > "What if we did at least carve out the DCB functionality away
> from
> > > > skb->priority? Since, AIUI, we're only concerning ourselves with
> > > > locally generated traffic here, we're talking
> > > > about skbs that have a socket attached to them. We could,
> instead of
> > > indexing
> > > > the prio_tc_map with skb->priority, we could index it with
> > > > skb->dev->priomap[skb->sk->prioidx] (as provided by this patch).
> The
> > > cgroup
> > > > then could be, instead of a strict priority cgroup, a
> queue_selector
> > > cgroup (or
> > > > something more appropriately named), and we don't have to touch
> skb-
> > > >priority at
> > > > all. I'd really rather not start down that road until I got more
> > > opinions and
> > > > consensus on that, but it seems like a pretty good solution, one
> that
> > > would
> > > > allow hardware queue selection in systems that use things like
> DCB to
> > > co-exist
> > > > with software queueing features."
> > > >
> > > I was initially ok with this, but the more I think about it, the
> more I
> > > think
> > > its just not needed (see further down in this email for my
> reasoning).
> > > John,
> > > Rob, do you have any thoughts here?
> > >
> > > > The piece that still kind of bothered me about the original
> proposal
> > > > (and perhaps this one) was that setting SO_PRIORITY in an app
> means
> > > > 'give my packets more mojo'.
> > > >
> > > > Taking something that took unprioritized packets and assigned
> them
> > > and
> > > > *them only* to a hardware queue struck me as possibly
> deprioritizing
> > > > the 'more mojo wanted' packets in the app(s), as they would end
> up in
> > > > some other, possibly overloaded, hardware queue.
> > > >
> > > I don't really see what you mean by this at all. Taking packets
> with
> > > no
> > > priority and assigning them a priority doesn't really have an
> effect on
> > > pre-prioritized packets. Or rather it shouldn't. You can
> certainly
> > > create a
> > > problem by having apps prioritized according to conflicting
> semantic
> > > rules, but
> > > that strikes me as administrative error. Garbage in...Garbage out.
> > >
> > > > So a cgroup that moves all of the packets from an application
> into a
> > > > given hardware queue, and then gets scheduled normally according
> to
> > > > skb->priority and friends (software queue, default of pfifo_fast,
> > > > etc), seems to make some sense to me. (I wouldn't mind if we had
> > > > abstractions for software queues, too, like, I need a software
> queue
> > > > with these properties, find me a place for it on the hardware -
> but
> > > > I'm dreaming)
> > > >
> > > > One open question is where do packets generated from other
> subsystems
> > > > end up, if you are using a cgroup for the app? arp, dns, etc?
> > > >
> > > The overriding rule is the association of an skb to a socket. If a
> > > transmitted
> > > frame has skb->sk set in dev_queue_xmit, then we interrogate its
> > > priority index
> > > as set when we passed through the sendmsg code at the top of the
> stack.
> > > Otherwise its behavior is unchanged from its current standpoint.
> > >
> > > > So to rephrase your original description from this:
> > > >
> > > > >> Any traffic originating from an application in a cgroup, that
> does
> > > not explicitly set its
> > > > >> priority with SO_PRIORITY will have its priority assigned to
> the
> > > value
> > > > >> designated for that group on that interface. This allows a
> user
> > > space daemon,
> > > > >> when conducting LLDP negotiation with a DCB enabled peer to
> create
> > > a cgroup
> > > > >> based on the APP_TLV value received and administratively
> assign
> > > applications to
> > > > >> that priority using the existing cgroup utility
> infrastructure.
> > > > > John, Robert, if you're supportive of these changes, some Acks
> > > would be
> > > > > appreciated.
> > > >
> > > > To this:
> > > >
> > > > "Any traffic originating from an application in a cgroup, will
> have
> > > > its hardware queue assigned to the value designated for that
> group
> > > on
> > > > that interface. This allows a user space daemon, when conducting
> > > LLDP
> > > > negotiation with a DCB enabled peer to create a cgroup based on
> the
> > > > APP_TLV value received and administratively assign applications
> to
> > > > that hardware queue using the existing cgroup utility
> > > infrastructure."
> > > >
> > > As above, I'm split brained about this. I'm ok with the idea of
> making
> > > this a
> > > queue selection cgroup, and separating it from priority, but at the
> > > same time,
> > > in the context of DCB, we really are assigning priority here, so it
> > > seems a bit
> > > false to do something that is not priority. I also like the fact
> that
> > > it
> > > provides administrative control in a way that netfilter and tc
> don't
> > > really
> > > enable.
> > >
> > > > Assuming we're on the same page here, what the heck is APP_TLV?
> > > >
> > > LLDP does layer 2 discovery with peer networking devices. It does
> so
> > > using sets
> > > of Type/length/value tuples. The types carry various bits of
> > > information, such
> > > as which priority groups are available on the network. The APP tlv
> > > conveys
> > > application or feature specific information. for instance, There
> is an
> > > ISCSI
> > > app tlv that tells the host that "on the interface you received
> this
> > > tlv, iscsi
> > > traffic must be sent at priority X". The idea being that, on
> receipt
> > > of this
> > > tlv, the DCB daemon can create an ISCSI network priority cgroup
> > > instance, and
> > > augment the cgroup rules file such that, when the user space iscsi
> > > daemon is
> > > started, its traffic automatically transmits at the appropriate
> > > priority.
> >
> > Love this !
> >
> > I guess if this is integrated to libvirt via libcgroups VMs could be
> assigned a network priority..
> >
> As the patch stand currently, absolutely. Just drop a qemu process
> into the
> approriate cgroup, and (assuming you're using a tun/tap type device),
> all the
> traffic from that vm will get assigned the corresponding priority. You
> can do
> the same thing with classification using net_cls already. I did a
> video of it
> here:
> http://www.youtube.com/watch?v=KX5QV4LId_c
Yes.. But I guess the present implementation allows iSCSI daemon to be grouped in an iSCSI network priority cgroup.
This is nicer since then we don't have to deal with a network cgroup + disk cgroup for the iSCSI use case.
Now.. since this is on the skb->priority I suspect this could become a per-session priority cgroup.
Is that the case or does the iscsid form just one cgroup?
> Neil
> > http://linuxplumbersconf.org/2010/ocw/proposals/843
> >
> >
> >
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists