[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <FDFFEFAB-A741-4232-821E-17BFAE5CAFAC@earthlink.net>
Date: Thu, 3 Jun 2010 01:16:54 -0700
From: Mitchell Erblich <erblichs@...thlink.net>
To: netdev@...r.kernel.org
Subject: Proposed linux kernel changes : scaling tcp/ip stack
To whom it may concern,
First, my assumption is to keep this discussion local to just a few tcp/ip
developers to see if there is any consensus that the below is a logical
approach. Please also pass this email if there is a "owner(s)" of this stack
to identify if a case exists for the below possible changes.
I am not currently on the linux kernel mail group.
I have experience with modifications of the Linux tcp/ip stack, and have
merged the changes into the company's local tree and left the possible
global integration to others.
I have been approached by a number of companies about scaling the
stack with the assumption of a number of cpu cores. At present, I find extra
time on my hands and am considering looking into this area on my own.
The first assumption is that if extra cores are available, that a single
received homogeneous flow of a large number of packets/segments per
second (pps) can be split into non-equal flows. This split can in effect
allow a larger recv'd pps rate at the same core load while splitting off
other workloads, such as xmit'ing pure ACKs.
Simply, again assuming Amdahl's law (and not looking to equalize the load
between cores), and creating logical separations where in a many core
system, different cores could have new kernel threads that operate in
parallel within the tcp/ip stack. The initial separation points would be at
the ip/tcp layer boundry and where any recv'd sk/pkt would generate some
form of output.
The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
with some form of queuing & scheduling, would be needed. In addition,
the queuing/schedullng of other kernel threads would occur within ip & tcp
to separate the I/O.
A possible validation test is to identify the max recv'd pps rate within the
tcp/ip modules within normal flow TCP established state with normal order
of say 64byte non fragmented segments, before and after each
incremental change. Or the same rate with fewer core/cpu cycles.
I am willing to have a private git Linux.org tree that concentrates proposed
changes into this tree and if there is willingness, a seen want/need then identify
how to implement the merge.
Mitchell Erblich
UNIX Kernel Engineer--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists