[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1302934897.2792.6.camel@edumazet-laptop>
Date: Sat, 16 Apr 2011 08:21:37 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Stephen Hemminger <stephen.hemminger@...tta.com>
Cc: Neil Horman <nhorman@...driver.com>, netdev@...r.kernel.org,
davem@...emloft.net, Dimitris Michailidis <dm@...lsio.com>,
Thomas Gleixner <tglx@...utronix.de>,
David Howells <dhowells@...hat.com>,
Tom Herbert <therbert@...gle.com>,
Ben Hutchings <bhutchings@...arflare.com>
Subject: Re: [PATCH 2/3] net: Add net device irq siloing feature
Le vendredi 15 avril 2011 à 21:52 -0700, Stephen Hemminger a écrit :
> > On Fri, Apr 15, 2011 at 11:49:03PM +0100, Ben Hutchings wrote:
> > > On Fri, 2011-04-15 at 16:17 -0400, Neil Horman wrote:
> > > > Using the irq affinity infrastrucuture, we can now allow net
> > > > devices to call
> > > > request_irq using a new wrapper function (request_net_irq), which
> > > > will attach a
> > > > common affinty_update handler to each requested irq. This affinty
> > > > update mechanism correlates each tracked irq to the flow(s) that
> > > > said irq processes
> > > > most frequently. The highest traffic flow is noted, marked and
> > > > exported to user
> > > > space via the affinity_hint proc file for each irq. In this way,
> > > > utilities like
> > > > irqbalance are able to determine which cpu is recieving the most
> > > > data from each
> > > > rx queue on a given NIC, and set irq affinity accordingly.
> > > [...]
> > >
> > > Is irqbalance expected to poll the affinity hints? How often?
> > >
> > Yes, its done just that for quite some time. Intel added that ability
> > at the
> > same time they added the affinity_hint proc file. Irqbalance polls the
> > affinity_hint file at the same time it rebalances all irqs (every 10
> > seconds). If the affinity_hint is non-zero, irqbalance just copies it
> > to smp_affinity for
> > the same irq. Up until now thats been just about dead code because
> > only ixgbe
> > sets affinity_hint. Thats why I added the affinity_alg file, so
> > irqbalance could do something more intellegent than just a blind copy.
> > With the patch that
> > I referenced I added code to irqbalance to allow it to preform
> > different balancing methods based on the output of affinity_alg.
> > Neil
>
> I hate the way more and more interfaces are becoming device driver
> specific. It makes it impossible to build sane management infrastructure
> and causes lots of customer and service complaints.
>
For me, the whole problem is the paradigm that we adapt IRQ to CPU were
applications _were_ running in last seconds, while process scheduler
might perform other choices, ie migrate task to cpu where IRQ was
happening (the cpu calling wakeups)
We can add logic to each layer, and yet not gain perfect behavior.
Some kind of cooperation is neeed.
Irqbalance for example is of no use in the case of a network flood
happening on your machine, because we enter NAPI mode for several
minutes on a single cpu. We'll need to add special logic in NAPI loop to
force an exit to reschedule an IRQ (so that another cpu can take it)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists