[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250205-flying-coucal-of-influence-0dcbc3@leitao>
Date: Wed, 5 Feb 2025 11:07:45 -0800
From: Breno Leitao <leitao@...ian.org>
To: Uday Shankar <ushankar@...estorage.com>
Cc: Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Srinivas Kandagatla <srinivas.kandagatla@...aro.org>,
Rafał Miłecki <rafal@...ecki.pl>,
Simon Horman <horms@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Berg <johannes@...solutions.net>,
Jonathan Corbet <corbet@....net>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-wireless@...r.kernel.org,
linux-doc@...r.kernel.org
Subject: Re: [PATCH v2 2/2] netconsole: allow selection of egress interface
via MAC address
On Tue, Feb 04, 2025 at 02:41:45PM -0700, Uday Shankar wrote:
> Currently, netconsole has two methods of configuration - module
> parameter and configfs. The former interface allows for netconsole
> activation earlier during boot (by specifying the module parameter on
> the kernel command line), so it is preferred for debugging issues which
> arise before userspace is up/the configfs interface can be used. The
> module parameter syntax requires specifying the egress interface name.
> This requirement makes it hard to use for a couple reasons:
> - The egress interface name can be hard or impossible to predict. For
> example, installing a new network card in a system can change the
> interface names assigned by the kernel.
> - When constructing the module parameter, one may have trouble
> determining the original (kernel-assigned) name of the interface
> (which is the name that should be given to netconsole) if some stable
> interface naming scheme is in effect. A human can usually look at
> kernel logs to determine the original name, but this is very painful
> if automation is constructing the parameter.
>
> For these reasons, allow selection of the egress interface via MAC
> address when configuring netconsole using the module parameter. Update
> the netconsole documentation with an example of the new syntax.
> Selection of egress interface by MAC address via configfs is far less
> interesting (since when this interface can be used, one should be able
> to easily convert between MAC address and interface name), so it is left
> unimplemented.
>
> Signed-off-by: Uday Shankar <ushankar@...estorage.com>
Reviewed-by: Breno Leitao <leitao@...ian.org>
Tested-by: Breno Leitao <leitao@...ian.org>
> int netpoll_setup(struct netpoll *np)
> {
> + struct net *net = current->nsproxy->net_ns;
> struct net_device *ndev = NULL;
> bool ip_overwritten = false;
> + char buf[MAC_ADDR_LEN + 1];
> struct in_device *in_dev;
> int err;
>
> rtnl_lock();
> - if (np->dev_name[0]) {
> - struct net *net = current->nsproxy->net_ns;
> + if (np->dev_name[0])
> ndev = __dev_get_by_name(net, np->dev_name);
> - }
> + else if (is_valid_ether_addr(np->dev_mac))
> + ndev = dev_getbyhwaddr_rcu(net, ARPHRD_ETHER, np->dev_mac);
You do not have the RCU read lock here. You have the rtnl(), which is
sufficient, but, CONFIG_PROVE_RCU_LIST will show something as:
WARNING: suspicious RCU usage
6.13.0-09701-g6610c7be45bb-dirty #18 Not tainted
-----------------------------
net/core/dev.c:1143 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by swapper/0/1:
#0: ffffffff832795b8 (rtnl_mutex){+.+.}-{4:4}, at: netpoll_setup+0x48/0x540
stack backtrace:
CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-virtme-09701-g6610c7be45bb-dirty #18
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x9f/0xf0
lockdep_rcu_suspicious+0x11a/0x150
dev_getbyhwaddr_rcu+0xb6/0xc0
netpoll_setup+0x8a/0x540
? netpoll_parse_options+0x2bd/0x310
This is not a problem per-se, since you have RTNL. We probably need to
tell for_each_netdev_rcu() to not comply about "RCU-list traversed in
non-reader section" if RTNL is held. Not sure why we didn't hit in the
test infrastructure, tho:
https://patchwork.kernel.org/project/netdevbpf/patch/20250204-netconsole-v2-2-5ef5eb5f6056@purestorage.com/
Anyway, no action item for you here. I am talking to Jakub on a way to
solve it, and I should send a fix soon.
Thanks for the patch,
--breno
Powered by blists - more mailing lists