[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJLG6pdkoCWQM1fifkK+OASD5DcNMq+uSv4N9cncFan3A@mail.gmail.com>
Date: Thu, 9 Nov 2023 16:21:32 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: davem@...emloft.net, netdev@...r.kernel.org, pabeni@...hat.com,
syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com, jhs@...atatu.com,
xiyou.wangcong@...il.com, jiri@...nulli.us
Subject: Re: [RFC net-next] net: don't dump stack on queue timeout
On Thu, Nov 9, 2023 at 1:09 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> The top syzbot report for networking (#14 for the entire kernel)
> is the queue timeout splat. We kept it around for a long time,
> because in real life it provides pretty strong signal that
> something is wrong with the driver or the device.
>
> Removing it is also likely to break monitoring for those who
> track it as a kernel warning.
>
> Nevertheless, WARN()ings are best suited for catching kernel
> programming bugs. If a Tx queue gets starved due to a pause
> storm, priority configuration, or other weirdness - that's
> obviously a problem, but not a problem we can fix at
> the kernel level.
>
> Bite the bullet and convert the WARN() to a print.
>
> Before:
>
> NETDEV WATCHDOG: eni1np1 (netdevsim): transmit queue 0 timed out 1975 ms
> WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x39e/0x3b0
> [... completely pointless stack trace of a timer follows ...]
>
> Now:
>
> netdevsim netdevsim1 eni1np1: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 1769 ms
>
> Alternatively we could mark the drivers which syzbot has
> learned to abuse as "print-instead-of-WARN" selectively.
>
> Reported-by: syzbot+d55372214aff0faa1f1f@...kaller.appspotmail.com
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
SGTM !
Reviewed-by: Eric Dumazet <edumazet@...gle.com>
Powered by blists - more mailing lists