netdev - Re: [PATCH net-next 03/10] net/mlx5: hw counters: Replace IDR+lists with xarray

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240815134425.GD632411@kernel.org>
Date: Thu, 15 Aug 2024 14:44:25 +0100
From: Simon Horman <horms@...nel.org>
To: Tariq Toukan <tariqt@...dia.com>
Cc: "David S. Miller" <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
	Saeed Mahameed <saeedm@...dia.com>, Gal Pressman <gal@...dia.com>,
	Leon Romanovsky <leonro@...dia.com>,
	Cosmin Ratiu <cratiu@...dia.com>
Subject: Re: [PATCH net-next 03/10] net/mlx5: hw counters: Replace IDR+lists
 with xarray

On Thu, Aug 15, 2024 at 08:46:49AM +0300, Tariq Toukan wrote:

...

> +/* Synchronization notes
> + *
> + * Access to counter array:
> + * - create - mlx5_fc_create() (user context)
> + *   - inserts the counter into the xarray.
> + *
> + * - destroy - mlx5_fc_destroy() (user context)
> + *   - erases the counter from the xarray and releases it.
> + *
> + * - query mlx5_fc_query(), mlx5_fc_query_cached{,_raw}() (user context)
> + *   - user should not access a counter after destroy.
> + *
> + * - bulk query (single thread workqueue context)
> + *   - create: query relies on 'lastuse' to avoid updating counters added
> + *             around the same time as the current bulk cmd.
> + *   - destroy: destroyed counters will not be accessed, even if they are
> + *              destroyed during a bulk query command.
> + */
> +static void mlx5_fc_stats_query_all_counters(struct mlx5_core_dev *dev)
>  {
>  	struct mlx5_fc_stats *fc_stats = dev->priv.fc_stats;
> -	bool query_more_counters = (first->id <= last_id);
> -	int cur_bulk_len = fc_stats->bulk_query_len;
> +	u32 bulk_len = fc_stats->bulk_query_len;
> +	XA_STATE(xas, &fc_stats->counters, 0);
>  	u32 *data = fc_stats->bulk_query_out;
> -	struct mlx5_fc *counter = first;
> +	struct mlx5_fc *counter;
> +	u32 last_bulk_id = 0;
> +	u64 bulk_query_time;
>  	u32 bulk_base_id;
> -	int bulk_len;
>  	int err;
>  
> -	while (query_more_counters) {
> -		/* first id must be aligned to 4 when using bulk query */
> -		bulk_base_id = counter->id & ~0x3;
> -
> -		/* number of counters to query inc. the last counter */
> -		bulk_len = min_t(int, cur_bulk_len,
> -				 ALIGN(last_id - bulk_base_id + 1, 4));
> -
> -		err = mlx5_cmd_fc_bulk_query(dev, bulk_base_id, bulk_len,
> -					     data);
> -		if (err) {
> -			mlx5_core_err(dev, "Error doing bulk query: %d\n", err);
> -			return;
> -		}
> -		query_more_counters = false;
> -
> -		list_for_each_entry_from(counter, &fc_stats->counters, list) {
> -			int counter_index = counter->id - bulk_base_id;
> -			struct mlx5_fc_cache *cache = &counter->cache;
> -
> -			if (counter->id >= bulk_base_id + bulk_len) {
> -				query_more_counters = true;
> -				break;
> +	xas_lock(&xas);
> +	xas_for_each(&xas, counter, U32_MAX) {
> +		if (xas_retry(&xas, counter))
> +			continue;
> +		if (unlikely(counter->id >= last_bulk_id)) {
> +			/* Start new bulk query. */
> +			/* First id must be aligned to 4 when using bulk query. */
> +			bulk_base_id = counter->id & ~0x3;
> +			last_bulk_id = bulk_base_id + bulk_len;
> +			/* The lock is released while querying the hw and reacquired after. */
> +			xas_unlock(&xas);
> +			/* The same id needs to be processed again in the next loop iteration. */
> +			xas_reset(&xas);
> +			bulk_query_time = jiffies;
> +			err = mlx5_cmd_fc_bulk_query(dev, bulk_base_id, bulk_len, data);
> +			if (err) {
> +				mlx5_core_err(dev, "Error doing bulk query: %d\n", err);
> +				return;
>  			}
> -
> -			update_counter_cache(counter_index, data, cache);
> +			xas_lock(&xas);
> +			continue;
>  		}
> +		/* Do not update counters added after bulk query was started. */

Hi Cosmin and Tariq,

It looks like bulk_query_time and bulk_base_id may be uninitialised or
stale - from a previous loop iteration - if the condition above is not met.

Flagged by Smatch.

> +		if (time_after64(bulk_query_time, counter->cache.lastuse))
> +			update_counter_cache(counter->id - bulk_base_id, data,
> +					     &counter->cache);
>  	}
> +	xas_unlock(&xas);
>  }

...