[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100429111047.031eeff9@nehalam>
Date: Thu, 29 Apr 2010 11:10:47 -0700
From: Stephen Hemminger <shemminger@...tta.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
Thomas Gleixner <tglx@...utronix.de>
Cc: Andi Kleen <ak@...goyle.fritz.box>, netdev@...r.kernel.org,
Andi Kleen <andi@...stfloor.org>
Subject: OFT - reserving CPU's for networking
> Le jeudi 29 avril 2010 à 19:42 +0200, Andi Kleen a écrit :
> > > Andi, what do you think of this one ?
> > > Dont we have a function to send an IPI to an individual cpu instead ?
> >
> > That's what this function already does. You only set a single CPU
> > in the target mask, right?
> >
> > IPIs are unfortunately always a bit slow. Nehalem-EX systems have X2APIC
> > which is a bit faster for this, but that's not available in the lower
> > end Nehalems. But even then it's not exactly fast.
> >
> > I don't think the IPI primitive can be optimized much. It's not a cheap
> > operation.
> >
> > If it's a problem do it less often and batch IPIs.
> >
> > It's essentially the same problem as interrupt mitigation or NAPI
> > are solving for NICs. I guess just need a suitable mitigation mechanism.
> >
> > Of course that would move more work to the sending CPU again, but
> > perhaps there's no alternative. I guess you could make it cheaper it by
> > minimizing access to packet data.
> >
> > -Andi
>
> Well, IPI are already batched, and rate is auto adaptative.
>
> After various changes, it seems things are going better, maybe there is
> something related to cache line trashing.
>
> I 'solved' it by using idle=poll, but you might take a look at
> clockevents_notify (acpi_idle_enter_bm) abuse of a shared and higly
> contended spinlock...
>
>
>
>
> 23.52% init [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> |
> --- _raw_spin_lock_irqsave
> |
> |--94.74%-- clockevents_notify
> | lapic_timer_state_broadcast
> | acpi_idle_enter_bm
> | cpuidle_idle_call
> | cpu_idle
> | start_secondary
> |
> |--4.10%-- tick_broadcast_oneshot_control
> | tick_notify
> | notifier_call_chain
> | __raw_notifier_call_chain
> | raw_notifier_call_chain
> | clockevents_do_notify
> | clockevents_notify
> | lapic_timer_state_broadcast
> | acpi_idle_enter_bm
> | cpuidle_idle_call
> | cpu_idle
> | start_secondary
> |
>
I keep getting asked about taking some core's away from clock and scheduler
to be reserved just for network processing. Seeing this kind of stuff
makes me wonder if maybe that isn't a half bad idea.
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists