[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aK1amQLIsi0hRvg3@fedora>
Date: Tue, 26 Aug 2025 06:56:25 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
MD Danish Anwar <danishanwar@...com>,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Jaakko Karrenpalo <jkarrenpalo@...il.com>,
Fernando Fernandez Mancera <ffmancera@...eup.net>,
Murali Karicheri <m-karicheri2@...com>,
WingMan Kwok <w-kwok2@...com>, Stanislav Fomichev <sdf@...ichev.me>,
Xiao Liang <shaw.leon@...il.com>,
Johannes Berg <johannes.berg@...el.com>,
Yu Liao <liaoyu15@...wei.com>, Arvid Brodin <arvid.brodin@...en.se>
Subject: Re: [PATCH net] hsr: add rcu lock for all hsr_for_each_port caller
On Mon, Aug 25, 2025 at 10:01:05PM -0700, Kuniyuki Iwashima wrote:
> On Mon, Aug 25, 2025 at 9:12 PM Hangbin Liu <liuhangbin@...il.com> wrote:
> >
> > hsr_for_each_port is called in many places without holding the RCU read
> > lock, this may trigger warnings on debug kernels like:
> >
> > [ 40.457015] [ T201] WARNING: suspicious RCU usage
> > [ 40.457020] [ T201] 6.17.0-rc2-virtme #1 Not tainted
> > [ 40.457025] [ T201] -----------------------------
> > [ 40.457029] [ T201] net/hsr/hsr_main.c:137 RCU-list traversed in non-reader section!!
> > [ 40.457036] [ T201]
> > other info that might help us debug this:
> >
> > [ 40.457040] [ T201]
> > rcu_scheduler_active = 2, debug_locks = 1
> > [ 40.457045] [ T201] 2 locks held by ip/201:
> > [ 40.457050] [ T201] #0: ffffffff93040a40 (&ops->srcu){.+.+}-{0:0}, at: rtnl_link_ops_get+0xf2/0x280
> > [ 40.457080] [ T201] #1: ffffffff92e7f968 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x5e1/0xb20
> > [ 40.457102] [ T201]
> > stack backtrace:
> > [ 40.457108] [ T201] CPU: 2 UID: 0 PID: 201 Comm: ip Not tainted 6.17.0-rc2-virtme #1 PREEMPT(full)
> > [ 40.457114] [ T201] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 40.457117] [ T201] Call Trace:
> > [ 40.457120] [ T201] <TASK>
> > [ 40.457126] [ T201] dump_stack_lvl+0x6f/0xb0
> > [ 40.457136] [ T201] lockdep_rcu_suspicious.cold+0x4f/0xb1
> > [ 40.457148] [ T201] hsr_port_get_hsr+0xfe/0x140
> > [ 40.457158] [ T201] hsr_add_port+0x192/0x940
> > [ 40.457167] [ T201] ? __pfx_hsr_add_port+0x10/0x10
> > [ 40.457176] [ T201] ? lockdep_init_map_type+0x5c/0x270
> > [ 40.457189] [ T201] hsr_dev_finalize+0x4bc/0xbf0
> > [ 40.457204] [ T201] hsr_newlink+0x3c3/0x8f0
> > [ 40.457212] [ T201] ? __pfx_hsr_newlink+0x10/0x10
> > [ 40.457222] [ T201] ? rtnl_create_link+0x173/0xe40
> > [ 40.457233] [ T201] rtnl_newlink_create+0x2cf/0x750
> > [ 40.457243] [ T201] ? __pfx_rtnl_newlink_create+0x10/0x10
> > [ 40.457247] [ T201] ? __dev_get_by_name+0x12/0x50
> > [ 40.457252] [ T201] ? rtnl_dev_get+0xac/0x140
> > [ 40.457259] [ T201] ? __pfx_rtnl_dev_get+0x10/0x10
> > [ 40.457285] [ T201] __rtnl_newlink+0x22c/0xa50
> > [ 40.457305] [ T201] rtnl_newlink+0x637/0xb20
> >
> > Fix it by wrapping the call with rcu_read_lock()/rcu_read_unlock().
> >
> > Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
> > Signed-off-by: Hangbin Liu <liuhangbin@...il.com>
> > ---
> > net/hsr/hsr_device.c | 37 ++++++++++++++++++++++++++++++++-----
> > net/hsr/hsr_main.c | 12 ++++++++++--
> > net/hsr/hsr_netlink.c | 4 ----
> > 3 files changed, 42 insertions(+), 11 deletions(-)
> >
> > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> > index 88657255fec1..67955b21b4a4 100644
> > --- a/net/hsr/hsr_device.c
> > +++ b/net/hsr/hsr_device.c
> > @@ -49,12 +49,15 @@ static bool hsr_check_carrier(struct hsr_port *master)
> >
> > ASSERT_RTNL();
> >
> > + rcu_read_lock();
> > hsr_for_each_port(master->hsr, port) {
>
> Why not use the 4th arg of list_for_each_entry_rcu() ?
>
> Adding random rcu_read_lock() looks confusing.
Yes. Thanks for this notify. I didn't notice the 4th arg of
list_for_each_entry_rcu(). Do you have any suggestion which lock we should
check? rtnl_is_locked() seems can't cover all cases.
Or maybe add a hsr_for_each_port_rntl() for some of net_device_ops?
And others still using rcu read lock? I'm not very sure. Do you have any
suggestions?
...
> > @@ -205,10 +216,13 @@ static netdev_features_t hsr_features_recompute(struct hsr_priv *hsr,
> > * may become enabled.
> > */
> > features &= ~NETIF_F_ONE_FOR_ALL;
> > +
> > + rcu_read_lock();
> > hsr_for_each_port(hsr, port)
> > features = netdev_increment_features(features,
> > port->dev->features,
> > mask);
> > + rcu_read_unlock();
> >
> > return features;
> > }
> > @@ -410,14 +424,11 @@ static void hsr_announce(struct timer_list *t)
> >
> > hsr = timer_container_of(hsr, t, announce_timer);
> >
> > - rcu_read_lock();
> > master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
> > hsr->proto_ops->send_sv_frame(master, &interval, master->dev->dev_addr);
>
> hsr_announce() is a timer func, and what protects master after
> rcu_read_unlock() in hsr_port_get_hsr() ?
hsr_port_get_hsr() is called in more places thank hsr_for_each_port().
That's why I set the rcu read lock in side of of hsr_port_get_hsr().
How about using dev_hold() to protect master device?
Thanks
Hangbin
Powered by blists - more mailing lists