[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BAA69BF.3080600@nortel.com>
Date: Wed, 24 Mar 2010 13:36:31 -0600
From: "Chris Friesen" <cfriesen@...tel.com>
To: Brandon Black <blblack@...il.com>
CC: linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: behavior of recvmmsg() on blocking sockets
On 03/24/2010 12:28 PM, Brandon Black wrote:
> On Wed, Mar 24, 2010 at 12:41 PM, Chris Friesen <cfriesen@...tel.com> wrote:
>> On 03/24/2010 10:15 AM, Brandon Black wrote:
>>> It uses a thread-per-socket model
>>
>> This doesn't scale well to large numbers of sockets....you get a lot of
>> unnecessary context switching.
>
> It scales great actually, within my measurement error of linear in
> testing so far. These are UDP server sockets, and the traffic pattern
> is one request packet maps to one response packet, with no longer-term
> per-client state (this is a DNS server, to be specific). The "do some
> work" code doesn't have any inter-thread contention (no locks, no
> writes to the same memory, etc), so the "threads" here may as well be
> processes if that makes the discussion less confusing. I haven't yet
> found a model that scales as well for me.
Note that I said "large numbers of sockets". Like tens of thousands.
In addition to context switch overhead this can also lead to issues with
memory consumption due to stack frames.
> I'm also just not personally sure whether there are network
> interfaces/drivers out there that could queue packets to the kernel
> (to a single socket) faster than recvmsg() could dequeue them to
> userspace
A 10Gig NIC could do this easily depending on your CPU.
> I still think having a "block until at least one packet arrives" mode
> for recvmmsg() makes sense though.
Agreed, as long as developers are aware that it won't be the most
efficient mode of operation.
Consider the case where you want to do some other useful work in
addition to running your network server. Every cpu cycle spent on the
network server is robbed from the other work. In this scenario you want
to handle packets as efficiently as possible, so the timeout-based
behaviour is better since it is more likely to give you multiple packets
per syscall.
Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists