[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <955CD08B-690D-42CF-A04D-FA618BA95B5A@athenacr.com>
Date: Thu, 5 Feb 2009 08:46:46 -0500
From: Wesley Chow <wchow@...enacr.com>
To: netdev@...r.kernel.org
Cc: Eric Dumazet <dada1@...mosbay.com>,
Kenny Chang <kchang@...enacr.com>,
Neil Horman <nhorman@...driver.com>
Subject: Re: Multicast packet loss
>>
>>
>> Maybe its time to change user side, and not try to find an
>> appropriate kernel :)
>>
>> If you know you have to receive N frames per 20us units, then its
>> better to :
>> Use non blocking sockets, and doing such loop :
>>
>> {
>> usleep(20); // or try to compensate if this thread is slowed too
>> much by following code
>> for (i = 0 ; i < N ; i++) {
>> while (revfrom(socket[N], ....) != -1)
>> receive_frame(...);
>> }
>> }
>>
>> That way, you are pretty sure network softirq handler wont have to
>> spend time trying
>> to wakeup 400.000 time per second one thread. All cpu cycles can be
>> spent in NIC driver
>> and network stack.
>>
>> Your thread will do 50.000 calls to nanosleep() per second, that is
>> not really expensive,
>> then N recvfrom() per iteration. It should work on all past ,
>> current and future kernels.
>>
> +1 to this idea. Since the last oprofile traces showed significant
> variance in
> the time spent in schedule(), it might be worthwhile to investigate
> the affects
> of the application behavior on this. I might also be worth adding a
> systemtap
> probe to sys_recvmsg, to count how many times we receive frames on a
> working and
> non-working system. If the app is behaving differently on different
> kernels,
> and its affecting the number of times you go to get a frame out of
> the stack,
> that would affect your drop rates, and it would show up in sys_recvmsg
>
I did some work to our test program to spin on a non-blocking socket
and it indeed seems to fix the problem, at least for 2.6.28.1, which
was a kernel we had problems with. The number of context switches
drastically drops -- from 200,000+ to less than 50!
I haven't done totally comprehensive tests yet, so I don't want to
officially state any results. I'm also out today, but Kenny may get a
chance to play with it. Spinning on the socket is looking like an
interesting solution, but we're a bit nervous about seeing our
processes constantly running at 100% CPU. Does C++ have a
MachineOnFire exception?
Wes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists