lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120801232953.3791.qmail@science.horizon.com>
Date:	1 Aug 2012 19:29:53 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	romieu@...zoreil.com
Cc:	linux@...izon.com, netdev@...r.kernel.org
Subject: Re: v3.5: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Thank you for the response!

> It's up to you but I suggest that you keep them until there is something
> better.

I was going to; I just wondered if they interfered with debugging or
something.

> As long as the device recovers, you may try and lower the watchdog timeout
> as well as increase the Tx ring size a bit (x2 or x4) to minimize the
> annoyances.

Out of curiosity, how does increasing the Tx ring size help?

But okay.  Just to make sure I'm doing it right (I'm pretty sure,
but scream if I'm making a mistake), I'm making the following edits to
drivers/net/ethernet/realtek/r8169.c

#define	NUM_TX_DESC	64	/* Number of Tx descriptor registers */

I'll double that to 128.

Now, since I am actually running at gigabit speed into a pretty capable
network that I don't expect to ever block me, I should be able to send
one 1500-byte frame in 12.3 microseconds (with all overhead, one 1500-byte
frame is 1538 bytes or 12304 bits), so 128 frames in 1.6 ms.

There is the issue of TSO, so one descriptor might send more than one
frame, but I think it's likely to break at 4K pages, the worst case is
128 * 4096 / 1500 = 350 frames in that Tx ring, which will take 4.3 ms.

Either way, I can drop the Tx timeout a *lot*.

#define	TL8169_TX_TIMEOUT	(6*HZ)

I want to drop that to HZ/100 or less.  Since I'm currently running with
CONFIG_HZ_100, and I'm not sure about the rounding (do I gain or lose
one tick due to ambiguity?) I'll bump HZ to 300 and change that to HZ/100.
That should give me a minimum of 2 ticks = 6.666 ms, which is still more
than it should take to transmit a full 

To make this short timeout actually work, I have to remove the "round
to nearest second" round_timer() calls in net/sched/sch_generic.c (there
are two that apply to dev->watchdog_timer), since I do want a sub-second
timeout granularity.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ