netdev - Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <1272563772.2222.301.camel@edumazet-laptop>
Date:	Thu, 29 Apr 2010 19:56:12 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Andi Kleen <ak@...goyle.fritz.box>
Cc:	hadi@...erus.ca, Changli Gao <xiaosuo@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Tom Herbert <therbert@...gle.com>,
	Stephen Hemminger <shemminger@...tta.com>,
	netdev@...r.kernel.org, Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH v6] net: batch skb dequeueing from softnet
 input_pkt_queue

Le jeudi 29 avril 2010 à 19:42 +0200, Andi Kleen a écrit :
> > Andi, what do you think of this one ?
> > Dont we have a function to send an IPI to an individual cpu instead ?
> 
> That's what this function already does. You only set a single CPU 
> in the target mask, right?
> 
> IPIs are unfortunately always a bit slow. Nehalem-EX systems have X2APIC
> which is a bit faster for this, but that's not available in the lower
> end Nehalems. But even then it's not exactly fast.
> 
> I don't think the IPI primitive can be optimized much. It's not a cheap 
> operation.
> 
> If it's a problem do it less often and batch IPIs.
> 
> It's essentially the same problem as interrupt mitigation or NAPI 
> are solving for NICs. I guess just need a suitable mitigation mechanism.
> 
> Of course that would move more work to the sending CPU again, but 
> perhaps there's no alternative. I guess you could make it cheaper it by
> minimizing access to packet data.
> 
> -Andi

Well, IPI are already batched, and rate is auto adaptative.

After various changes, it seems things are going better, maybe there is
something related to cache line trashing.

I 'solved' it by using idle=poll, but you might take a look at
clockevents_notify (acpi_idle_enter_bm) abuse of a shared and higly
contended spinlock...




    23.52%            init  [kernel.kallsyms]             [k] _raw_spin_lock_irqsave
                      |
                      --- _raw_spin_lock_irqsave
                         |          
                         |--94.74%-- clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          
                         |--4.10%-- tick_broadcast_oneshot_control
                         |          tick_notify
                         |          notifier_call_chain
                         |          __raw_notifier_call_chain
                         |          raw_notifier_call_chain
                         |          clockevents_do_notify
                         |          clockevents_notify
                         |          lapic_timer_state_broadcast
                         |          acpi_idle_enter_bm
                         |          cpuidle_idle_call
                         |          cpu_idle
                         |          start_secondary
                         |          

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html