lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acb95e0283e0720979f67f8321c3cdbe@linux.dev>
Date:   Thu, 20 Apr 2023 02:17:21 +0000
From:   "Yajun Deng" <yajun.deng@...ux.dev>
To:     "Jakub Kicinski" <kuba@...nel.org>
Cc:     jhs@...atatu.com, xiyou.wangcong@...il.com, jiri@...nulli.us,
        davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: sched: print jiffies when transmit queue time
 out

April 20, 2023 9:27 AM, "Jakub Kicinski" <kuba@...nel.org> wrote:

> On Wed, 19 Apr 2023 19:56:32 +0800 Yajun Deng wrote:
> 
>> Although there is watchdog_timeo to let users know when the transmit queue
>> begin stall, but dev_watchdog() is called with an interval. The jiffies
>> will always be greater than watchdog_timeo.
>> 
>> To let users know the exact time the stall started, print jiffies when
>> the transmit queue time out.
> 
> Please add an explanation of how this information is useful in practice.

We found some cases with several warnings. We want to confirm which happened first. 

First warning:
16:37:57 kernel: [ 7100.097547] ------------[ cut here ]------------
16:37:57 kernel: [ 7100.097550] NETDEV WATCHDOG: eno2 (i40e): transmit queue 8 timed out
16:37:57 kernel: [ 7100.097571] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x260/0x270
...

Second warning:
16:38:44 kernel: [ 7147.756952] rcu: INFO: rcu_preempt self-detected stall on CPU
16:38:44 kernel: [ 7147.756958] rcu:   24-....: (59999 ticks this GP) idle=546/1/0x4000000000000000 softirq=367      3137/3673146 fqs=13844
16:38:44 kernel: [ 7147.756960]        (t=60001 jiffies g=4322709 q=133381)
16:38:44 kernel: [ 7147.756962] NMI backtrace for cpu 24
...

As we can see, the transmit queue start stall should be before 16:37:52, the rcu start stall is 16:37:44.
These two times are closer, we want to confirm which happened first.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ