lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130109115850.055b7a7e@vostro>
Date:	Wed, 9 Jan 2013 11:58:50 +0200
From:	Timo Teras <timo.teras@....fi>
To:	Francois Romieu <romieu@...zoreil.com>
Cc:	netdev@...r.kernel.org
Subject: Re: r8169 rx_missed increasing in bursts (regression)

On Tue, 8 Jan 2013 23:58:33 +0100 Francois Romieu
<romieu@...zoreil.com> wrote:

> Timo Teras <timo.teras@....fi> :
> [...]
> > My current hypothesis is that due to high softirq and recent(ish)
> > commit da78dbf "r8169: remove work from irq handler" moving more
> > work to softirq makes the receive path now suffer from latency from
> > getting irq to reading packets from the NIC on these boxes. And
> > that at times the rx fifo can get full causing a missed packet or
> > so.
> 
> This hypothesis won't explain the regression in 3.3.8 since 3.3.x does
> not include commit da78dbf.
> 
> Do you notice any netdev watchdog message in dmesg ?

In production boxes. No.

The lab environment where we tried to reproduce this, we received:
NOHZ: local_softirq_pending 08

Which is likely related, but separate issue. And fixed by commit
da78dbf. So seems that just got upgraded to "regression fix".

> 'perf top' may exhibit something unusual too.

Will try this.

I did notice that:
/proc/net/softnet_stat's 3rd field aka. softnet_data.time_squeeze keeps
incrementing when ever rx_missed increases. Sometiems time_squeeze
increments on it own. But rx_missed never increases without time_squeeze
bumping up seriously too.

> > This might be further escalated by the bug fixed in commit 7dbb491
> > "r8169: avoid NAPI scheduling delay" (which is not present in
> > -stable trees).
> 
> Right, it would had been worth adding to -stable.
> 
> However it only 1) is a problem for 3.4.x (fixed in 3.5) and 2)
> triggers when returning from the slow work thread - which should not
> be used much.

Ok. Didn't realize 3.3.x did not include it. So something else is broke
too.

The slow thread handles the RxOverflow, and in rx_missed case is taken
relatively often. Maybe add a printk there.

> [...]
> > So would it be sensible to do something like:
> > -#define NUM_RX_DESC    256     /* Number of Rx descriptor
> > registers */ +#define NUM_RX_DESC    512     /* Number of Rx
> > descriptor registers */
> 
> You can try it but it may actually increase the amount of heavy work
> done in softirq.

Ok. Will try this and some other things along with added debug logging.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ