[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110416115534.GA2085@neilslaptop.think-freely.org>
Date: Sat, 16 Apr 2011 07:55:34 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Stephen Hemminger <stephen.hemminger@...tta.com>,
netdev@...r.kernel.org, davem@...emloft.net,
Dimitris Michailidis <dm@...lsio.com>,
Thomas Gleixner <tglx@...utronix.de>,
David Howells <dhowells@...hat.com>,
Tom Herbert <therbert@...gle.com>,
Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: [PATCH 2/3] net: Add net device irq siloing feature
On Sat, Apr 16, 2011 at 08:21:37AM +0200, Eric Dumazet wrote:
> Le vendredi 15 avril 2011 à 21:52 -0700, Stephen Hemminger a écrit :
> > > On Fri, Apr 15, 2011 at 11:49:03PM +0100, Ben Hutchings wrote:
> > > > On Fri, 2011-04-15 at 16:17 -0400, Neil Horman wrote:
> > > > > Using the irq affinity infrastrucuture, we can now allow net
> > > > > devices to call
> > > > > request_irq using a new wrapper function (request_net_irq), which
> > > > > will attach a
> > > > > common affinty_update handler to each requested irq. This affinty
> > > > > update mechanism correlates each tracked irq to the flow(s) that
> > > > > said irq processes
> > > > > most frequently. The highest traffic flow is noted, marked and
> > > > > exported to user
> > > > > space via the affinity_hint proc file for each irq. In this way,
> > > > > utilities like
> > > > > irqbalance are able to determine which cpu is recieving the most
> > > > > data from each
> > > > > rx queue on a given NIC, and set irq affinity accordingly.
> > > > [...]
> > > >
> > > > Is irqbalance expected to poll the affinity hints? How often?
> > > >
> > > Yes, its done just that for quite some time. Intel added that ability
> > > at the
> > > same time they added the affinity_hint proc file. Irqbalance polls the
> > > affinity_hint file at the same time it rebalances all irqs (every 10
> > > seconds). If the affinity_hint is non-zero, irqbalance just copies it
> > > to smp_affinity for
> > > the same irq. Up until now thats been just about dead code because
> > > only ixgbe
> > > sets affinity_hint. Thats why I added the affinity_alg file, so
> > > irqbalance could do something more intellegent than just a blind copy.
> > > With the patch that
> > > I referenced I added code to irqbalance to allow it to preform
> > > different balancing methods based on the output of affinity_alg.
> > > Neil
> >
> > I hate the way more and more interfaces are becoming device driver
> > specific. It makes it impossible to build sane management infrastructure
> > and causes lots of customer and service complaints.
> >
>
> For me, the whole problem is the paradigm that we adapt IRQ to CPU were
> applications _were_ running in last seconds, while process scheduler
> might perform other choices, ie migrate task to cpu where IRQ was
> happening (the cpu calling wakeups)
>
> We can add logic to each layer, and yet not gain perfect behavior.
>
> Some kind of cooperation is neeed.
>
> Irqbalance for example is of no use in the case of a network flood
> happening on your machine, because we enter NAPI mode for several
> minutes on a single cpu. We'll need to add special logic in NAPI loop to
> force an exit to reschedule an IRQ (so that another cpu can take it)
>
Would you consider an approach whereby we, instead of updating irq affinity to
match the process that consumes data from a given irq, bias the scheduler such
that process which consume data from a given irq not be moved away from the same
core/l2 cache being fed by that flow? Do you have a suggestion for how best to
communicate that to the scheduler? It would seem that interrogating the RFS
table from the scheduler might not be well received.
Best
Neil
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists