netdev - Re: Multicast packet loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <955CD08B-690D-42CF-A04D-FA618BA95B5A@athenacr.com>
Date:	Thu, 5 Feb 2009 08:46:46 -0500
From:	Wesley Chow <wchow@...enacr.com>
To:	netdev@...r.kernel.org
Cc:	Eric Dumazet <dada1@...mosbay.com>,
	Kenny Chang <kchang@...enacr.com>,
	Neil Horman <nhorman@...driver.com>
Subject: Re: Multicast packet loss


>>
>>
>> Maybe its time to change user side, and not try to find an  
>> appropriate kernel :)
>>
>> If you know you have to receive N frames per 20us units, then its  
>> better to :
>> Use non blocking sockets, and doing such loop :
>>
>> {
>> usleep(20); // or try to compensate if this thread is slowed too  
>> much by following code
>> for (i = 0 ; i < N ; i++) {
>> 	while (revfrom(socket[N], ....) != -1)
>> 		receive_frame(...);
>> 	}
>> }
>>
>> That way, you are pretty sure network softirq handler wont have to  
>> spend time trying
>> to wakeup 400.000 time per second one thread. All cpu cycles can be  
>> spent in NIC driver
>> and network stack.
>>
>> Your thread will do 50.000 calls to nanosleep() per second, that is  
>> not really expensive,
>> then N recvfrom() per iteration. It should work on all past ,  
>> current and future kernels.
>>
> +1 to this idea.  Since the last oprofile traces showed significant  
> variance in
> the time spent in schedule(), it might be worthwhile to investigate  
> the affects
> of the application behavior on this.  I might also be worth adding a  
> systemtap
> probe to sys_recvmsg, to count how many times we receive frames on a  
> working and
> non-working system.  If the app is behaving differently on different  
> kernels,
> and its affecting the number of times you go to get a frame out of  
> the stack,
> that would affect your drop rates, and it would show up in sys_recvmsg
>


I did some work to our test program to spin on a non-blocking socket  
and it indeed seems to fix the problem, at least for 2.6.28.1, which  
was a kernel we had problems with. The number of context switches  
drastically drops -- from 200,000+ to less than 50!

I haven't done totally comprehensive tests yet, so I don't want to  
officially state any results. I'm also out today, but Kenny may get a  
chance to play with it. Spinning on the socket is looking like an  
interesting solution, but we're a bit nervous about seeing our  
processes constantly running at 100% CPU. Does C++ have a  
MachineOnFire exception?


Wes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html