lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 Feb 2009 08:33:41 -0500
From:	Neil Horman <nhorman@...driver.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Wesley Chow <wchow@...enacr.com>, netdev@...r.kernel.org,
	Kenny Chang <kchang@...enacr.com>
Subject: Re: Multicast packet loss

On Wed, Feb 04, 2009 at 07:11:36PM +0100, Eric Dumazet wrote:
> Wesley Chow a écrit :
> >>>>>>
> >>>>>>
> >>>>> Are these quad core systems?  Or dual core w/ hyperthreading?  I
> >>>>> ask because in
> >>>>> your working setup you have 1/2 the number of cpus' and was not
> >>>>> sure if you
> >>>>> removed an entire package of if you just disabled hyperthreading.
> >>>>>
> >>>>>
> >>>>> Neil
> >>>>>
> >>>>>
> >>>> Yeah, these are quad core systems.  The 8 cpu system is a
> >>>> dual-processor  quad-core.  The other is my desktop, single cpu quad
> >>>> core.
> >>>>
> >>>>
> > 
> > 
> > Just to be clear: on the 2 x quad core system, we can run with a 2.6.15
> > kernel and see no packet drops. In fact, we can run with 2.6.19, 2.6.20,
> > and 2.6.21 just fine. 2.6.22 is the first kernel that shows problems.
> > 
> > Kenny posted results from a working setup on a different machine.
> > 
> > What I would really like to know is if whatever changed between 2.6.21
> > and 2.6.22 that broke things is confined just to bnx2. To make this a
> > rigorous test, we would need to use the same machine with a different
> > nic, which we don't have quite yet. An Intel Pro 1000 ethernet card is
> > in the mail as I type this.
> > 
> > I also tried forward porting the bnx2 driver in 2.6.21 to 2.6.22
> > (unsuccessfully), and building the most recent driver from the Broadcom
> > site to Ubuntu Hardy's 2.6.24. The most recent driver with hardy 2.6.24
> > showed similar packet dropping problems. Hm, perhaps I'll try to build
> > the most recent broadcom driver against 2.6.21.
> > 
> 
> Try oprofile session, you shall see a scheduler effect (dont want to call
> this a regression, no need for another flame war).
> 
> also give us "vmstat 1" results  (number of context switches per second)
> 
> On recent kernels, scheduler might be faster than before: You get more wakeups per
> second and more work to do by softirq handler (it does more calls to scheduler,
> thus less cpu cycles available for draining NIC RX queue in time)
> 
> opcontrol --vmlinux=/path/vmlinux --start
> <run benchmark>
> opreport -l /path/vmlinux | head -n 50
> 
> Recent schedulers tend to be optimum for lower latencies (and thus, on
> a high level of wakeups, you get less bandwidth because of sofirq using
> a whole CPU)
> 
> For example, if you have one tread receiving data on 4 or 8 sockets, you'll
> probably notice better throughput (because it will sleep less often)
> 
> Multicast receiving on N sockets, with one thread waiting on each socket
> is basically a way to trigger a scheduler storm. (N wakeups per packet).
> So its more a benchmark to stress scheduler than stressing network stack...
> 
> 
> Maybe its time to change user side, and not try to find an appropriate kernel :)
> 
> If you know you have to receive N frames per 20us units, then its better to :
> Use non blocking sockets, and doing such loop :
> 
> {
> usleep(20); // or try to compensate if this thread is slowed too much by following code
> for (i = 0 ; i < N ; i++) {
> 	while (revfrom(socket[N], ....) != -1)
> 		receive_frame(...);
> 	}
> }
> 
> That way, you are pretty sure network softirq handler wont have to spend time trying
> to wakeup 400.000 time per second one thread. All cpu cycles can be spent in NIC driver
> and network stack.
> 
> Your thread will do 50.000 calls to nanosleep() per second, that is not really expensive,
> then N recvfrom() per iteration. It should work on all past , current and future kernels.
> 
+1 to this idea.  Since the last oprofile traces showed significant variance in
the time spent in schedule(), it might be worthwhile to investigate the affects
of the application behavior on this.  I might also be worth adding a systemtap
probe to sys_recvmsg, to count how many times we receive frames on a working and
non-working system.  If the app is behaving differently on different kernels,
and its affecting the number of times you go to get a frame out of the stack,
that would affect your drop rates, and it would show up in sys_recvmsg

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists