lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 10 Nov 2023 08:11:00 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com, 
	pabeni@...hat.com, syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com, 
	xiyou.wangcong@...il.com, jiri@...nulli.us
Subject: Re: [RFC net-next] net: don't dump stack on queue timeout

On Wed, Nov 8, 2023 at 7:09 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> The top syzbot report for networking (#14 for the entire kernel)
> is the queue timeout splat. We kept it around for a long time,
> because in real life it provides pretty strong signal that
> something is wrong with the driver or the device.
>
> Removing it is also likely to break monitoring for those who
> track it as a kernel warning.
>
> Nevertheless, WARN()ings are best suited for catching kernel
> programming bugs. If a Tx queue gets starved due to a pause
> storm, priority configuration, or other weirdness - that's
> obviously a problem, but not a problem we can fix at
> the kernel level.
>
> Bite the bullet and convert the WARN() to a print.
>
> Before:
>
>   NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
>   WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
>   [... completely pointless stack trace of a timer follows ...]
>
> Now:
>
>   netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
>
> Alternatively we could mark the drivers which syzbot has
> learned to abuse as "print-instead-of-WARN" selectively.
>
> Reported-by: syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>

Reviewed-by: Jamal Hadi Salim <jhs@...atatu.com>

cheers,
jamal

> ---
> CC: jhs@...atatu.com
> CC: xiyou.wangcong@...il.com
> CC: jiri@...nulli.us
> ---
>  net/sched/sch_generic.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 4195a4bc26ca..8dd0e5925342 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -522,8 +522,9 @@ static void dev_watchdog(struct timer_list *t)
>
>                         if (unlikely(timedout_ms)) {
>                                 trace_net_dev_xmit_timeout(dev, i);
> -                               WARN_ONCE(1, "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out %u ms\n",
> -                                         dev->name, netdev_drivername(dev), i, timedout_ms);
> +                               netdev_crit(dev, "NETDEV WATCHDOG: CPU: %d: transmit queue %u timed out %u ms\n",
> +                                           raw_smp_processor_id(),
> +                                           i, timedout_ms);
>                                 netif_freeze_queues(dev);
>                                 dev->netdev_ops->ndo_tx_timeout(dev, i);
>                                 netif_unfreeze_queues(dev);
> --
> 2.41.0
>

Powered by blists - more mailing lists