[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1251472963.3705.26.camel@raz>
Date: Fri, 28 Aug 2009 18:22:43 +0300
From: raz ben yehuda <raziebe@...il.com>
To: Rik van Riel <riel@...hat.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Chris Friesen <cfriesen@...tel.com>,
Andrew Morton <akpm@...ux-foundation.org>, mingo@...e.hu,
peterz@...radead.org, maximlevitsky@...il.com, efault@....de,
wiseman@...s.biu.ac.il, linux-kernel@...r.kernel.org,
linux-rt-users@...r.kernel.org
Subject: Re: RFC: THE OFFLINE SCHEDULER
On Fri, 2009-08-28 at 09:25 -0400, Rik van Riel wrote:
> raz ben yehuda wrote:
>
> > yes. latency is a crucial property.
>
> In the case of network packets, wouldn't you get a lower
> latency by transmitting the packet from the CPU that
> knows the packet should be transmitted, instead of sending
> an IPI to another CPU and waiting for that CPU to do the
> work?
Hello Rik
If I understand what you are saying, you say that I pass 1.5K packets to
a offline CPU ?
If so, then this is not what I do, because you are very right, it does
not make any sense.
I do not pass packets to an offline cpu , i pass assignments. an
assignment is a buffer with some context of what do with it (like aio)
and a buffer is of ~1MB. Also, the offline processor holds the network
interface as it own interface. No two offline processors transmit over a
single interface.( I modified the bonding driver to work with offline
processor for that ). I am aware of network queue per processors, but
benchmarks proved this was better.( I do not have these benchmarks
now).
Also these engines do not release any sk_buffs to the operating system,
these packets are being reused over and over to reduce latency of
allocating memory and cache misses.
Also, in some cases I disabled the transmit interrupts and I released
packets ( --skb->users was still greater than 0, not really release ) in
an offline context.I learned it from the chelsio driver. This way, I
reduced more load from the operating system. It proved to be better in
large 1Gbps arrays and was able to remove atomic_inc atomic_dec in some
variants of the code, atomic operations cost a lot.
in MSI cards I did not find it useful.in the example i showed, i use MSI
and system is almost idle.
Also, as I recall , IPI will not pass to an offladed processor. offsced
it runs NMI.
Also, I would to express my apologies if any of this correspondence
seems to be as I am trying to PR offsched. I am not.
> Inter-CPU communication has always been the bottleneck
> when it comes to SMP performance. Why does adding more
> inter-CPU communication make your system faster, instead
> of slower like one would expect?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists