[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mszq7hio.fsf@nvidia.com>
Date: Thu, 20 Jul 2023 16:51:40 +0200
From: Petr Machata <petrm@...dia.com>
To: Jiri Pirko <jiri@...nulli.us>
CC: Petr Machata <petrm@...dia.com>, <netdev@...r.kernel.org>,
<kuba@...nel.org>, <pabeni@...hat.com>, <davem@...emloft.net>,
<edumazet@...gle.com>, <moshe@...dia.com>, <saeedm@...dia.com>,
<idosch@...dia.com>
Subject: Re: [patch net-next v2 00/11] devlink: introduce dump selector attr
and use it for per-instance dumps
Jiri Pirko <jiri@...nulli.us> writes:
> Thu, Jul 20, 2023 at 03:55:00PM CEST, petrm@...dia.com wrote:
>
>>I'll take this through our nightly and will report back tomorrow.
>
> Sure. I ran mlxsw regression with this already, no issues.
You started it on one machine and it went well for a while. But it's
getting a stream of these splats right now:
INFO - INFO - [ 4155.564670] rcu: INFO: rcu_preempt self-detected stall on CPU
INFO - INFO - [ 4155.571093] rcu: 7-....: (99998 ticks this GP) idle=ac7c/1/0x4000000000000000 softirq=86447/86447 fqs=25001
INFO - INFO - [ 4155.582077] rcu: (t=100015 jiffies g=289809 q=1459 ncpus=8)
INFO - INFO - [ 4155.588398] CPU: 7 PID: 38940 Comm: ip Not tainted 6.5.0-rc1jiri+ #1
INFO - INFO - [ 4155.595497] Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 01/06/2019
INFO - INFO - [ 4155.604915] RIP: 0010:__netlink_lookup+0xca/0x150
INFO - INFO - [ 4155.610171] Code: 00 00 48 89 c7 48 83 cf 01 48 8b 10 48 83 e2 fe 48 0f 44 d7 f6 c2 01 75 5a 0f b7 4b 16 44 8b 44 24 08 49 89 c9 49 f7 d9 eb 08 <48> 8b 12 f6 c2 01 75 41 4a 8d 34 0a 44 39 86 e8 02 00 00 75 eb 48
INFO - INFO - [ 4155.631156] RSP: 0018:ffffbea7ca41b760 EFLAGS: 00000213
INFO - INFO - [ 4155.636992] RAX: ffffa048c25120b0 RBX: ffffa048c01e4000 RCX: 0000000000000400
INFO - INFO - [ 4155.644964] RDX: ffffa048c6d4b400 RSI: ffffa048c6d4b000 RDI: ffffa048c25120b1
INFO - INFO - [ 4155.652935] RBP: ffffa048c2512000 R08: 00000000888fe595 R09: fffffffffffffc00
INFO - INFO - [ 4155.660906] R10: 00000000302e3030 R11: 0000006900030008 R12: ffffa048c9205900
INFO - INFO - [ 4155.668879] R13: 00000000888fe595 R14: 0000000000000001 R15: ffffa048c01e4000
INFO - INFO - [ 4155.676850] FS: 00007f2155bcf800(0000) GS:ffffa04c2fdc0000(0000) knlGS:0000000000000000
INFO - INFO - [ 4155.685890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
INFO - INFO - [ 4155.692307] CR2: 00000000004e4140 CR3: 000000014c919005 CR4: 00000000003706e0
INFO - INFO - [ 4155.700279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
INFO - INFO - [ 4155.708249] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
INFO - INFO - [ 4155.716220] Call Trace:
INFO - INFO - [ 4155.718948] <IRQ>
INFO - INFO - [ 4155.721190] ? rcu_dump_cpu_stacks+0xea/0x170
INFO - INFO - [ 4155.726057] ? rcu_sched_clock_irq+0x53b/0x10b0
INFO - INFO - [ 4155.731116] ? update_load_avg+0x54/0x280
INFO - INFO - [ 4155.735593] ? notifier_call_chain+0x5a/0xc0
INFO - INFO - [ 4155.740361] ? timekeeping_update+0xaf/0x280
INFO - INFO - [ 4155.745130] ? timekeeping_advance+0x374/0x590
INFO - INFO - [ 4155.750093] ? update_process_times+0x74/0xb0
INFO - INFO - [ 4155.754957] ? tick_sched_handle+0x33/0x50
INFO - INFO - [ 4155.759529] ? tick_sched_timer+0x6b/0x80
INFO - INFO - [ 4155.763995] ? tick_sched_do_timer+0x80/0x80
INFO - INFO - [ 4155.768762] ? __hrtimer_run_queues+0x10f/0x2a0
INFO - INFO - [ 4155.773820] ? hrtimer_interrupt+0xf8/0x230
INFO - INFO - [ 4155.778492] ? __sysvec_apic_timer_interrupt+0x52/0x120
INFO - INFO - [ 4155.784327] ? sysvec_apic_timer_interrupt+0x6d/0x90
INFO - INFO - [ 4155.789874] </IRQ>
INFO - INFO - [ 4155.792211] <TASK>
INFO - INFO - [ 4155.794549] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
INFO - INFO - [ 4155.800485] ? __netlink_lookup+0xca/0x150
INFO - INFO - [ 4155.805059] netlink_unicast+0x132/0x390
INFO - INFO - [ 4155.809437] rtnl_getlink+0x36d/0x410
INFO - INFO - [ 4155.813532] rtnetlink_rcv_msg+0x14f/0x3b0
INFO - INFO - [ 4155.818106] ? __alloc_pages+0x17c/0x290
INFO - INFO - [ 4155.822485] ? rtnl_calcit.isra.0+0x140/0x140
INFO - INFO - [ 4155.827348] netlink_rcv_skb+0x58/0x100
INFO - INFO - [ 4155.831631] netlink_unicast+0x23c/0x390
INFO - INFO - [ 4155.836010] netlink_sendmsg+0x214/0x470
INFO - INFO - [ 4155.840390] ? netlink_unicast+0x390/0x390
INFO - INFO - [ 4155.844963] ____sys_sendmsg+0x16a/0x260
INFO - INFO - [ 4155.849345] ___sys_sendmsg+0x9a/0xe0
INFO - INFO - [ 4155.853437] __sys_sendmsg+0x7a/0xc0
INFO - INFO - [ 4155.857428] do_syscall_64+0x38/0x80
INFO - INFO - [ 4155.861419] entry_SYSCALL_64_after_hwframe+0x63/0xcd
BTW, while for core patches, any machine pass is usually a good
predictor of full regression pass, that's not always the case. There's
a reason we run on about 15 machines plus simulation. Even if this had
"no issues", there would be value in getting full regression run.
I'm pulling this from the nightly again.
Powered by blists - more mailing lists