linux-kernel - Re: warn: Turn the netdev timeout WARN

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080917032708.GA8431@havoc.gtf.org>
Date:	Tue, 16 Sep 2008 23:27:08 -0400
From:	Jeff Garzik <jeff@...zik.org>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	torvalds@...ux-foundation.org, davem@...emloft.net,
	arjan@...ux.intel.com
Subject: Re: warn: Turn the netdev timeout WARN_ON() into a WARN()

On Wed, Sep 17, 2008 at 02:59:12AM +0000, Linux Kernel Mailing List wrote:
>     
>     this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
>     so that the device and driver names are inside the warning message.
>     This helps automated tools like kerneloops.org to collect the data
>     and do statistics, as well as making it more likely that humans
>     cut-n-paste the important message as part of a bugreport.
>     
>     Signed-off-by: Arjan van de Ven <arjan@...ux.intel.com>
>     Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
>  
> +#define WARN_ONCE(condition, format...)	({			\
> +	static int __warned;					\
> +	int __ret_warn_once = !!(condition);			\
> +								\
> +	if (unlikely(__ret_warn_once))				\
> +		if (WARN(!__warned, format)) 			\
> +			__warned = 1;				\
> +	unlikely(__ret_warn_once);				\
> +})
> +
>  #define WARN_ON_RATELIMIT(condition, state)			\
>  		WARN_ON((condition) && __ratelimit(state))
>  
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 9634091..ec0a083 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -215,10 +215,9 @@ static void dev_watchdog(unsigned long arg)
>  			    time_after(jiffies, (dev->trans_start +
>  						 dev->watchdog_timeo))) {
>  				char drivername[64];
> -				printk(KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
> +				WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit timed out\n",
>  				       dev->name, netdev_drivername(dev, drivername, 64));
>  				dev->tx_timeout(dev);
> -				WARN_ON_ONCE(1);


hrm, am I misunderstanding?

AFAICS, this change means the user is no longer notified [after
the first time] of a condition they really need to know about --
a hardware or driver bug.

These conditions can occur many hours or days apart, and the admin
needs to know EACH time it occurs, because it is a major networking
event, generally leading to a complete reset of the entire hardware.

And quite honestly, the backtrace is not useful (yes, even the one
that existing previously)...  THINK for a second.  The backtrace
is going to look exactly the same, since it is a timer-triggered
dev_watchdog() call.

NETDEV WATCHDOG timeouts are not easily fixable errors like lockdep
warnings, and the admin really does need to see each one.

Unless I am missing something, (1) this patch should be reverted,
and in additional, (2) I recommend removing the WARN_ON_ONCE()
because the backtrace is not helpful.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/