netdev - Re: [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats in ops locked drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a2768226-854e-464d-8e76-240f7c76e987@intel.com>
Date: Wed, 9 Apr 2025 22:23:28 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, <davem@...emloft.net>
CC: <netdev@...r.kernel.org>, <edumazet@...gle.com>, <pabeni@...hat.com>,
	<andrew+netdev@...n.ch>, <horms@...nel.org>, <sdf@...ichev.me>,
	<hramamurthy@...gle.com>, <kuniyu@...zon.com>, <jdamato@...tly.com>
Subject: Re: [PATCH net-next v2 8/8] netdev: depend on netdev->lock for qstats
 in ops locked drivers



On 4/8/2025 12:59 PM, Jakub Kicinski wrote:
> We mostly needed rtnl_lock in qstat to make sure the queue count
> is stable while we work. For "ops locked" drivers the instance
> lock protects the queue count, so we don't have to take rtnl_lock.
> 
> For currently ops-locked drivers: netdevsim and bnxt need
> the protection from netdev going down while we dump, which
> instance lock provides. gve doesn't care.
> 
> Reviewed-by: Joe Damato <jdamato@...tly.com>
> Acked-by: Stanislav Fomichev <sdf@...ichev.me>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
>  Documentation/networking/netdevices.rst |  6 +++++
>  include/net/netdev_queues.h             |  4 +++-
>  net/core/netdev-genl.c                  | 29 +++++++++++++++----------
>  3 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst
> index 7ae28c5fb835..0ccc7dcf4390 100644
> --- a/Documentation/networking/netdevices.rst
> +++ b/Documentation/networking/netdevices.rst
> @@ -356,6 +356,12 @@ Similarly to ``ndos`` the instance lock is only held for select drivers.
>  For "ops locked" drivers all ethtool ops without exceptions should
>  be called under the instance lock.
>  
> +struct netdev_stat_ops
> +----------------------
> +
> +"qstat" ops are invoked under the instance lock for "ops locked" drivers,
> +and under rtnl_lock for all other drivers.
> +
>  struct net_shaper_ops
>  ---------------------
>  

What determines if a driver is "ops locked"? Is that defined above this
chunk in the doc? I see its when netdev_need_ops_lock() is set? Ok.
Sounds like it would be good to start migrating drivers over to this
locking paradigm over time.

> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index 825141d675e5..ea709b59d827 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -85,9 +85,11 @@ struct netdev_queue_stats_tx {
>   * for some of the events is not maintained, and reliable "total" cannot
>   * be provided).
>   *
> + * Ops are called under the instance lock if netdev_need_ops_lock()
> + * returns true, otherwise under rtnl_lock.
>   * Device drivers can assume that when collecting total device stats,
>   * the @get_base_stats and subsequent per-queue calls are performed
> - * "atomically" (without releasing the rtnl_lock).
> + * "atomically" (without releasing the relevant lock).
>   *
>   * Device drivers are encouraged to reset the per-queue statistics when
>   * number of queues change. This is because the primary use case for
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index 8c58261de969..b64c614a00c4 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -795,26 +795,31 @@ int netdev_nl_qstats_get_dumpit(struct sk_buff *skb,
>  	if (info->attrs[NETDEV_A_QSTATS_IFINDEX])
>  		ifindex = nla_get_u32(info->attrs[NETDEV_A_QSTATS_IFINDEX]);
>  
> -	rtnl_lock();

We used to lock here..

>  	if (ifindex) {
> -		netdev = __dev_get_by_index(net, ifindex);
> -		if (netdev && netdev->stat_ops) {
> +		netdev = netdev_get_by_index_lock_ops_compat(net, ifindex);
> +		if (!netdev) {
> +			NL_SET_BAD_ATTR(info->extack,
> +					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> +			return -ENODEV;
> +		}

I guess netdev_get_by_index_lock_ops_compat acquires the lock when it
returns success?

> +		if (netdev->stat_ops) {
>  			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
>  							    info, ctx);
>  		} else {
>  			NL_SET_BAD_ATTR(info->extack,
>  					info->attrs[NETDEV_A_QSTATS_IFINDEX]);
> -			err = netdev ? -EOPNOTSUPP : -ENODEV;
> -		}
> -	} else {

But there's an else branch here so now I'm confused with how this
locking works.

> -		for_each_netdev_dump(net, netdev, ctx->ifindex) {
> -			err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
> -							    info, ctx);
> -			if (err < 0)
> -				break;
> +			err = -EOPNOTSUPP;
>  		}
> +		netdev_unlock_ops_compat(netdev);

And we call netdev_unlock_ops_compat() here... but I don't see how this
branch acquired the lock?

> +		return err;
> +	}
> +
> +	for_each_netdev_lock_ops_compat_scoped(net, netdev, ctx->ifindex) {
> +		err = netdev_nl_qstats_get_dump_one(netdev, scope, skb,
> +						    info, ctx);
> +		if (err < 0)
> +			break;

This looks like its scope guarded so its fine.

>  	}
> -	rtnl_unlock();
>  

What am I missing?