[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZUzbfIDtiykahZ/t@nanopsycho>
Date: Thu, 9 Nov 2023 14:15:40 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, edumazet@...gle.com,
pabeni@...hat.com,
syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com,
jhs@...atatu.com, xiyou.wangcong@...il.com
Subject: Re: [RFC net-next] net: don't dump stack on queue timeout
Thu, Nov 09, 2023 at 01:09:01AM CET, kuba@...nel.org wrote:
>The top syzbot report for networking (#14 for the entire kernel)
>is the queue timeout splat. We kept it around for a long time,
>because in real life it provides pretty strong signal that
>something is wrong with the driver or the device.
>
>Removing it is also likely to break monitoring for those who
>track it as a kernel warning.
>
>Nevertheless, WARN()ings are best suited for catching kernel
>programming bugs. If a Tx queue gets starved due to a pause
>storm, priority configuration, or other weirdness - that's
>obviously a problem, but not a problem we can fix at
>the kernel level.
>
>Bite the bullet and convert the WARN() to a print.
>
>Before:
>
> NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
> [... completely pointless stack trace of a timer follows ...]
>
>Now:
>
> netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
>
>Alternatively we could mark the drivers which syzbot has
>learned to abuse as "print-instead-of-WARN" selectively.
>
>Reported-by: syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com
>Signed-off-by: Jakub Kicinski <kuba@...nel.org>
Makes sense.
Reviewed-by: Jiri Pirko <jiri@...dia.com>
Powered by blists - more mailing lists