[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230410235249.36uo76ivwisdx7xu@skbuf>
Date: Tue, 11 Apr 2023 02:52:49 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: netdev@...r.kernel.org
Cc: Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
David Ahern <dsahern@...nel.org>,
Ido Schimmel <idosch@...dia.com>
Subject: Re: [PATCH v2 net] net: don't omit syncing RX filters to devices
that are down
On Mon, Apr 10, 2023 at 10:52:20PM +0300, Vladimir Oltean wrote:
> There are 2 possible ways to solve the issue.
>
> Alternatively, we could remove the check/optimization and thus make
> dev_mc_del() always propagate down to the ndo_set_rx_mode() of the
> device. This would implicitly solve the IGMP/IGMP6 code paths with DSA,
> as well as any other potential issues of this kind with address deletion
> not being synced prior to device removal.
Self NACK.
After a more careful inspection of dmesg, I now notice this WARN_ON
during probe time:
[ 7.710448] mscc_felix 0000:00:00.5 swp0 (uninitialized): PHY [0000:00:00.3:10] driver [Microsemi GE VSC8514 SyncE] (irq=POLL)
[ 7.735401]
[ 7.736921] ============================================
[ 7.742266] WARNING: possible recursive locking detected
[ 7.747610] 6.3.0-rc5-01277-g8ec1b4985857 #77 Not tainted
[ 7.753048] --------------------------------------------
[ 7.758391] kworker/u4:0/8 is trying to acquire lock:
[ 7.763477] ffff5c348439a280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[ 7.770991]
[ 7.770991] but task is already holding lock:
[ 7.776859] ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[ 7.784358]
[ 7.784358] other info that might help us debug this:
[ 7.790924] Possible unsafe locking scenario:
[ 7.790924]
[ 7.796876] CPU0
[ 7.799340] ----
[ 7.801803] lock(_xmit_ETHER);
[ 7.805073] lock(_xmit_ETHER);
[ 7.808342]
[ 7.808342] *** DEADLOCK ***
[ 7.808342]
[ 7.814295] May be due to missing lock nesting notation
[ 7.814295]
[ 7.821119] 7 locks held by kworker/u4:0/8:
[ 7.825334] #0: ffff5c3480007948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1f8/0x568
[ 7.835549] #1: ffff80000856bd58 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x224/0x568
[ 7.844886] #2: ffff5c34826a71c0 (&dev->mutex){....}-{4:4}, at: __device_attach+0x48/0x1a0
[ 7.853358] #3: ffffb9e51aa8db80 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x50/0x1188
[ 7.862182] #4: ffffb9e51aa70788 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x28/0x40
[ 7.869956] #5: ffff5c34843f95b0 (&idev->mc_lock){+.+.}-{4:4}, at: __ipv6_dev_mc_inc+0xa8/0x498
[ 7.878864] #6: ffff5c34843e1280 (_xmit_ETHER){+...}-{3:3}, at: dev_mc_add+0x40/0xa0
[ 7.886803]
[ 7.886803] stack backtrace:
[ 7.891188] CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 6.3.0-rc5-01277-g8ec1b4985857 #77
[ 7.904249] Workqueue: events_unbound deferred_probe_work_func
[ 7.910146] Call trace:
[ 7.912611] dump_backtrace+0x108/0x130
[ 7.916486] show_stack+0x24/0x30
[ 7.919832] dump_stack_lvl+0x60/0x80
[ 7.923531] dump_stack+0x18/0x28
[ 7.926880] __lock_acquire+0x7e8/0x2fc8
[ 7.930850] lock_acquire+0x118/0x260
[ 7.934555] _raw_spin_lock_nested+0x68/0xb0
[ 7.938871] dev_mc_add+0x40/0xa0
[ 7.942218] dsa_slave_sync_mc+0x68/0x180
[ 7.946264] __hw_addr_sync_dev+0x138/0x158
[ 7.950483] dsa_slave_set_rx_mode+0x3c/0x70
[ 7.954796] __dev_set_rx_mode+0x80/0xa0
[ 7.958762] dev_mc_add+0x74/0xa0
[ 7.962109] igmp6_group_added+0x78/0x128
[ 7.966162] __ipv6_dev_mc_inc+0x278/0x498
[ 7.970299] ipv6_dev_mc_inc+0x20/0x38
[ 7.974087] ipv6_add_dev+0x3f0/0x4d0
[ 7.977791] addrconf_notify+0x1b0/0x4a8
[ 7.981757] raw_notifier_call_chain+0x50/0x88
[ 7.986254] call_netdevice_notifiers+0x74/0xd0
[ 7.990823] register_netdevice+0x4f0/0x600
[ 7.995054] dsa_slave_create+0x3f8/0x620
[ 7.999099] dsa_port_setup+0x10c/0x158
[ 8.002978] dsa_register_switch+0xe18/0x1188
[ 8.007378] felix_pci_probe+0x120/0x1f0
[ 8.011345] pci_device_probe+0x1b0/0x278
[ 8.015394] really_probe+0x13c/0x2f8
[ 8.019097] __driver_probe_device+0xc0/0xf8
[ 8.023409] driver_probe_device+0x48/0x218
[ 8.027634] __device_attach_driver+0x128/0x158
[ 8.032209] bus_for_each_drv+0x12c/0x160
[ 8.036257] __device_attach+0xcc/0x1a0
[ 8.040131] device_initial_probe+0x20/0x38
[ 8.044356] bus_probe_device+0xa0/0x118
[ 8.048316] deferred_probe_work_func+0x98/0xe0
[ 8.052890] process_one_work+0x290/0x568
[ 8.056935] worker_thread+0x238/0x4a8
[ 8.060716] kthread+0x108/0x130
[ 8.063983] ret_from_fork+0x10/0x20
which appears to be a false positive caused by the newly opened time
window in which DSA user ports have net_devices registered, but they
aren't yet upper devices of the DSA master, so their dev->nested_level
is the same.
It seems like I will return to the targeted fix for v3, after some more
investigation here tomorrow.
Powered by blists - more mailing lists