[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1275556440.2456.19.camel@edumazet-laptop>
Date: Thu, 03 Jun 2010 11:14:00 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Mitchell Erblich <erblichs@...thlink.net>
Cc: netdev@...r.kernel.org
Subject: Re: Proposed linux kernel changes : scaling tcp/ip stack
Le jeudi 03 juin 2010 à 01:16 -0700, Mitchell Erblich a écrit :
> To whom it may concern,
>
> First, my assumption is to keep this discussion local to just a few tcp/ip
> developers to see if there is any consensus that the below is a logical
> approach. Please also pass this email if there is a "owner(s)" of this stack
> to identify if a case exists for the below possible changes.
>
> I am not currently on the linux kernel mail group.
>
> I have experience with modifications of the Linux tcp/ip stack, and have
> merged the changes into the company's local tree and left the possible
> global integration to others.
>
> I have been approached by a number of companies about scaling the
> stack with the assumption of a number of cpu cores. At present, I find extra
> time on my hands and am considering looking into this area on my own.
>
> The first assumption is that if extra cores are available, that a single
> received homogeneous flow of a large number of packets/segments per
> second (pps) can be split into non-equal flows. This split can in effect
> allow a larger recv'd pps rate at the same core load while splitting off
> other workloads, such as xmit'ing pure ACKs.
>
> Simply, again assuming Amdahl's law (and not looking to equalize the load
> between cores), and creating logical separations where in a many core
> system, different cores could have new kernel threads that operate in
> parallel within the tcp/ip stack. The initial separation points would be at
> the ip/tcp layer boundry and where any recv'd sk/pkt would generate some
> form of output.
>
> The ip/tcp layer would be split like the vintage AT&T STREAMs protocol,
> with some form of queuing & scheduling, would be needed. In addition,
> the queuing/schedullng of other kernel threads would occur within ip & tcp
> to separate the I/O.
>
> A possible validation test is to identify the max recv'd pps rate within the
> tcp/ip modules within normal flow TCP established state with normal order
> of say 64byte non fragmented segments, before and after each
> incremental change. Or the same rate with fewer core/cpu cycles.
>
> I am willing to have a private git Linux.org tree that concentrates proposed
> changes into this tree and if there is willingness, a seen want/need then identify
> how to implement the merge.
Hi Mitchell
We work everyday to improve network stack, and standard linux tree is
pretty scalable, you dont need to setup a separate git tree for that.
Our beloved maintainer David S. Miller handles two trees, net-2.6 and
net-next-2.6 where we put all our changes.
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6.git
I suggest you read the last patches (say .. about 10.000 of them), to
have an idea of things we did during last years.
keywords : RCU, multiqueue, RPS, percpu data, lockless algos, cache line
placement...
Its nice to see another man joining the team !
Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists