lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <31422cbc-7848-4bc6-9f9c-ef1d8400123c@kernel.org>
Date: Mon, 16 Oct 2023 12:13:35 +0300
From: Roger Quadros <rogerq@...nel.org>
To: Vladimir Oltean <vladimir.oltean@....com>,
 MD Danish Anwar <danishanwar@...com>
Cc: Andrew Lunn <andrew@...n.ch>, Vignesh Raghavendra <vigneshr@...com>,
 Richard Cochran <richardcochran@...il.com>, Paolo Abeni <pabeni@...hat.com>,
 Jakub Kicinski <kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
 "David S. Miller" <davem@...emloft.net>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, srk@...com,
 r-gunasekaran@...com, linux-arm-kernel@...ts.infradead.org,
 Roger Quadros <rogerq@...com>,
 Vinicius Costa Gomes <vinicius.gomes@...el.com>
Subject: Re: [PATCH net-next v4] net: ti: icssg_prueth: add TAPRIO offload
 support



On 11/10/2023 13:25, Vladimir Oltean wrote:
> On Fri, Oct 06, 2023 at 03:50:28PM +0530, MD Danish Anwar wrote:
>> From: Roger Quadros <rogerq@...com>
>>
>> ICSSG dual-emac f/w supports Enhanced Scheduled Traffic (EST – defined
>> in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
>> configuration. EST allows express queue traffic to be scheduled
>> (placed) on the wire at specific repeatable time intervals. In
>> Linux kernel, EST configuration is done through tc command and
>> the taprio scheduler in the net core implements a software only
>> scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
>> user indicate "flag 2" in the command which is then parsed by
>> taprio scheduler in net core and indicate that the command is to
>> be offloaded to h/w. taprio then offloads the command to the
>> driver by calling ndo_setup_tc() ndo ops. This patch implements
>> ndo_setup_tc() to offload EST configuration to ICSSG.
>>
>> Signed-off-by: Roger Quadros <rogerq@...com>
>> Signed-off-by: Vignesh Raghavendra <vigneshr@...com>
>> Signed-off-by: MD Danish Anwar <danishanwar@...com>
>> ---
>> Cc: Roger Quadros <rogerq@...com>
>> Cc: Andrew Lunn <andrew@...n.ch>
>> Cc: Vinicius Costa Gomes <vinicius.gomes@...el.com>
>> Cc: Jakub Kicinski <kuba@...nel.org>
>>
>> Changes from v3 to v4:
>> *) Rebased on the latest next-20231005 linux-next.
> 
> The tree that the patch will be applied to is
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git,
> not linux-next.
> 
>> *) Addressed Roger and Vinicius' comments and moved all the validations to
>>    emac_taprio_replace() API.
>> *) Modified emac_setup_taprio() API to use switch case based on taprio->cmd
>>    and added emac_taprio_destroy() and emac_taprio_replace() APIs.
>> *) Modified the documentation of structs / enums in icssg_qos.h by using
>>    the correct kdoc format.
>>
>> Changes from v2 to v3:
>> *) Rebased on the latest next-20230928 linux-next.
>> *) Retained original authorship of the patch.
>> *) Addressed Roger's comments and modified emac_setup_taprio() and
>>    emac_set_taprio() APIs accordingly.
>> *) Removed netif_running() check from emac_setup_taprio().
>> *) Addressed Vinicius' comments and added check for MIN and MAX cycle time.
>> *) Added check for allocation failure of est_new in emac_setup_taprio().
>>
>> Changes from v1 to v2:
>> *) Rebased on the latest next-20230921 linux-next.
>> *) Dropped the RFC tag as merge window is open now.
>> *) Splitted this patch from the switch mode series [v1].
>> *) Removed TODO comment as asked by Andrew and Roger.
>> *) Changed Copyright to 2023 as asked by Roger.
>>
>> v3: https://lore.kernel.org/all/20230928103000.186304-1-danishanwar@ti.com/
>> v2: https://lore.kernel.org/all/20230921070031.795788-1-danishanwar@ti.com/
>> v1: https://lore.kernel.org/all/20230830110847.1219515-1-danishanwar@ti.com/
>>
>>  drivers/net/ethernet/ti/Makefile             |   3 +-
>>  drivers/net/ethernet/ti/icssg/icssg_prueth.c |   5 +-
>>  drivers/net/ethernet/ti/icssg/icssg_prueth.h |   6 +
>>  drivers/net/ethernet/ti/icssg/icssg_qos.c    | 301 +++++++++++++++++++
>>  drivers/net/ethernet/ti/icssg/icssg_qos.h    | 113 +++++++
>>  5 files changed, 426 insertions(+), 2 deletions(-)
>>  create mode 100644 drivers/net/ethernet/ti/icssg/icssg_qos.c
>>  create mode 100644 drivers/net/ethernet/ti/icssg/icssg_qos.h
>>
>> diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
>> index 34fd7a716ba6..0df60ded1b2d 100644
>> --- a/drivers/net/ethernet/ti/Makefile
>> +++ b/drivers/net/ethernet/ti/Makefile
>> @@ -37,5 +37,6 @@ icssg-prueth-y := k3-cppi-desc-pool.o \
>>  		  icssg/icssg_config.o \
>>  		  icssg/icssg_mii_cfg.o \
>>  		  icssg/icssg_stats.o \
>> -		  icssg/icssg_ethtool.o
>> +		  icssg/icssg_ethtool.o \
>> +		  icssg/icssg_qos.o
>>  obj-$(CONFIG_TI_ICSS_IEP) += icssg/icss_iep.o
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> index 6635b28bc672..89c301716926 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
>> @@ -1166,7 +1166,7 @@ static int emac_phy_connect(struct prueth_emac *emac)
>>  	return 0;
>>  }
>>  
>> -static u64 prueth_iep_gettime(void *clockops_data, struct ptp_system_timestamp *sts)
>> +u64 prueth_iep_gettime(void *clockops_data, struct ptp_system_timestamp *sts)
>>  {
>>  	u32 hi_rollover_count, hi_rollover_count_r;
>>  	struct prueth_emac *emac = clockops_data;
>> @@ -1403,6 +1403,8 @@ static int emac_ndo_open(struct net_device *ndev)
>>  		napi_enable(&emac->tx_chns[i].napi_tx);
>>  	napi_enable(&emac->napi_rx);
>>  
>> +	icssg_qos_tas_init(ndev);
> 
> I believe the intention is for this code to be run before any taprio
> offload is added, correct? But it is possible for the user to add an
> offloaded Qdisc even while the netdev has not yet been brought up.
> Is that case handled correctly, or will it simply result in NULL pointer
> dereferences (tas->config_list)?
> 
>> +
>>  	/* start PHY */
>>  	phy_start(ndev->phydev);
>>  
>> @@ -1669,6 +1671,7 @@ static const struct net_device_ops emac_netdev_ops = {
>>  	.ndo_set_rx_mode = emac_ndo_set_rx_mode,
>>  	.ndo_eth_ioctl = emac_ndo_ioctl,
>>  	.ndo_get_stats64 = emac_ndo_get_stats64,
>> +	.ndo_setup_tc = icssg_qos_ndo_setup_tc,
>>  };
>>  
>>  /* get emac_port corresponding to eth_node name */
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.h b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> index 8b6d6b497010..7cbf0e561905 100644
>> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.h
>> @@ -37,6 +37,7 @@
>>  #include "icssg_config.h"
>>  #include "icss_iep.h"
>>  #include "icssg_switch_map.h"
>> +#include "icssg_qos.h"
>>  
>>  #define PRUETH_MAX_MTU          (2000 - ETH_HLEN - ETH_FCS_LEN)
>>  #define PRUETH_MIN_PKT_SIZE     (VLAN_ETH_ZLEN)
>> @@ -174,6 +175,8 @@ struct prueth_emac {
>>  
>>  	struct pruss_mem_region dram;
>>  
>> +	struct prueth_qos qos;
>> +
>>  	struct delayed_work stats_work;
>>  	u64 stats[ICSSG_NUM_STATS];
>>  };
>> @@ -285,4 +288,7 @@ u32 icssg_queue_level(struct prueth *prueth, int queue);
>>  void emac_stats_work_handler(struct work_struct *work);
>>  void emac_update_hardware_stats(struct prueth_emac *emac);
>>  int emac_get_stat_by_name(struct prueth_emac *emac, char *stat_name);
>> +
>> +u64 prueth_iep_gettime(void *clockops_data, struct ptp_system_timestamp *sts);
>> +
>>  #endif /* __NET_TI_ICSSG_PRUETH_H */
>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_qos.c b/drivers/net/ethernet/ti/icssg/icssg_qos.c
>> new file mode 100644
>> index 000000000000..c8c4450c41bb
>> --- /dev/null
>> +++ b/drivers/net/ethernet/ti/icssg/icssg_qos.c
>> @@ -0,0 +1,301 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Texas Instruments ICSSG PRUETH QoS submodule
>> + * Copyright (C) 2023 Texas Instruments Incorporated - http://www.ti.com/
>> + */
>> +
>> +#include <linux/printk.h>
>> +#include "icssg_prueth.h"
>> +#include "icssg_switch_map.h"
>> +
>> +static void tas_update_fw_list_pointers(struct prueth_emac *emac)
>> +{
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +
>> +	if ((readb(tas->active_list)) == TAS_LIST0) {
>> +		tas->fw_active_list = emac->dram.va + TAS_GATE_MASK_LIST0;
>> +		tas->fw_shadow_list = emac->dram.va + TAS_GATE_MASK_LIST1;
>> +	} else {
>> +		tas->fw_active_list = emac->dram.va + TAS_GATE_MASK_LIST1;
>> +		tas->fw_shadow_list = emac->dram.va + TAS_GATE_MASK_LIST0;
>> +	}
>> +}
>> +
>> +static void tas_update_maxsdu_table(struct prueth_emac *emac)
>> +{
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +	u16 __iomem *max_sdu_tbl_ptr;
>> +	u8 gate_idx;
>> +
>> +	/* update the maxsdu table */
>> +	max_sdu_tbl_ptr = emac->dram.va + TAS_QUEUE_MAX_SDU_LIST;
>> +
>> +	for (gate_idx = 0; gate_idx < TAS_MAX_NUM_QUEUES; gate_idx++)
>> +		writew(tas->max_sdu_table.max_sdu[gate_idx], &max_sdu_tbl_ptr[gate_idx]);
>> +}
>> +
>> +static void tas_reset(struct prueth_emac *emac)
>> +{
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +	int i;
>> +
>> +	for (i = 0; i < TAS_MAX_NUM_QUEUES; i++)
>> +		tas->max_sdu_table.max_sdu[i] = 2048;
>> +
>> +	tas_update_maxsdu_table(emac);
>> +
>> +	writeb(TAS_LIST0, tas->active_list);
>> +
>> +	memset_io(tas->fw_active_list, 0, sizeof(*tas->fw_active_list));
>> +	memset_io(tas->fw_shadow_list, 0, sizeof(*tas->fw_shadow_list));
>> +}
>> +
>> +static int tas_set_state(struct prueth_emac *emac, enum tas_state state)
>> +{
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +	int ret;
>> +
>> +	if (tas->state == state)
>> +		return 0;
>> +
>> +	switch (state) {
>> +	case TAS_STATE_RESET:
>> +		tas_reset(emac);
>> +		ret = emac_set_port_state(emac, ICSSG_EMAC_PORT_TAS_RESET);
>> +		tas->state = TAS_STATE_RESET;
>> +		break;
>> +	case TAS_STATE_ENABLE:
>> +		ret = emac_set_port_state(emac, ICSSG_EMAC_PORT_TAS_ENABLE);
>> +		tas->state = TAS_STATE_ENABLE;
>> +		break;
>> +	case TAS_STATE_DISABLE:
>> +		ret = emac_set_port_state(emac, ICSSG_EMAC_PORT_TAS_DISABLE);
>> +		tas->state = TAS_STATE_DISABLE;
>> +		break;
>> +	default:
>> +		netdev_err(emac->ndev, "%s: unsupported state\n", __func__);
>> +		ret = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	if (ret)
>> +		netdev_err(emac->ndev, "TAS set state failed %d\n", ret);
>> +	return ret;
>> +}
>> +
>> +static int tas_set_trigger_list_change(struct prueth_emac *emac)
>> +{
>> +	struct tc_taprio_qopt_offload *admin_list = emac->qos.tas.taprio_admin;
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +	struct ptp_system_timestamp sts;
>> +	u32 change_cycle_count;
>> +	u32 cycle_time;
>> +	u64 base_time;
>> +	u64 cur_time;
>> +
>> +	if (admin_list->cycle_time < TAS_MIN_CYCLE_TIME)
>> +		return -EINVAL;
> 
> Shouldn't this verification have been part of emac_taprio_replace(), and
> have a proper extack?
> 
>> +
>> +	cycle_time = admin_list->cycle_time - 4; /* -4ns to compensate for IEP wraparound time */
> 
> Details? Doesn't this make the phase alignment of the schedule diverge
> from what the user expects?
> 
>> +	base_time = admin_list->base_time;
>> +	cur_time = prueth_iep_gettime(emac, &sts);
>> +
>> +	if (base_time > cur_time)
>> +		change_cycle_count = DIV_ROUND_UP_ULL(base_time - cur_time, cycle_time);
>> +	else
>> +		change_cycle_count = 1;
> 
> I see that the base_time is only used to calculate the number of cycles
> relative to cur_time. Taprio users want to specify a basetime value
> which indicates the phase alignment of the schedule. This is important
> when the device is synchronized over PTP with other switches in the
> network. Can you explain how is the basetime taken into consideration in
> your implementation?
> 
>> +
>> +	writel(cycle_time, emac->dram.va + TAS_ADMIN_CYCLE_TIME);
>> +	writel(change_cycle_count, emac->dram.va + TAS_CONFIG_CHANGE_CYCLE_COUNT);
>> +	writeb(admin_list->num_entries, emac->dram.va + TAS_ADMIN_LIST_LENGTH);
>> +
>> +	/* config_change cleared by f/w to ack reception of new shadow list */
>> +	writeb(1, &tas->config_list->config_change);
>> +	/* config_pending cleared by f/w when new shadow list is copied to active list */
>> +	writeb(1, &tas->config_list->config_pending);
>> +
>> +	return emac_set_port_state(emac, ICSSG_EMAC_PORT_TAS_TRIGGER);
>> +}
>> +
>> +static int tas_update_oper_list(struct prueth_emac *emac)
>> +{
>> +	struct tc_taprio_qopt_offload *admin_list = emac->qos.tas.taprio_admin;
>> +	struct tas_config *tas = &emac->qos.tas.config;
>> +	u32 tas_acc_gate_close_time = 0;
>> +	u8 idx, gate_idx, val;
>> +	int ret;
>> +
>> +	if (admin_list->cycle_time > TAS_MAX_CYCLE_TIME)
>> +		return -EINVAL;
>> +
>> +	tas_update_fw_list_pointers(emac);
>> +
>> +	for (idx = 0; idx < admin_list->num_entries; idx++) {
>> +		writeb(admin_list->entries[idx].gate_mask,
>> +		       &tas->fw_shadow_list->gate_mask_list[idx]);
>> +		tas_acc_gate_close_time += admin_list->entries[idx].interval;
>> +
>> +		/* extend last entry till end of cycle time */
>> +		if (idx == admin_list->num_entries - 1)
>> +			writel(admin_list->cycle_time,
>> +			       &tas->fw_shadow_list->win_end_time_list[idx]);
>> +		else
>> +			writel(tas_acc_gate_close_time,
>> +			       &tas->fw_shadow_list->win_end_time_list[idx]);
>> +	}
>> +
>> +	/* clear remaining entries */
>> +	for (idx = admin_list->num_entries; idx < TAS_MAX_CMD_LISTS; idx++) {
>> +		writeb(0, &tas->fw_shadow_list->gate_mask_list[idx]);
>> +		writel(0, &tas->fw_shadow_list->win_end_time_list[idx]);
>> +	}
>> +
>> +	/* update the Array of gate close time for each queue in each window */
>> +	for (idx = 0 ; idx < admin_list->num_entries; idx++) {
>> +		/* On Linux, only PRUETH_MAX_TX_QUEUES are supported per port */
>> +		for (gate_idx = 0; gate_idx < PRUETH_MAX_TX_QUEUES; gate_idx++) {
>> +			u8 gate_mask_list_idx = readb(&tas->fw_shadow_list->gate_mask_list[idx]);
>> +			u32 gate_close_time = 0;
>> +
>> +			if (gate_mask_list_idx & BIT(gate_idx))
>> +				gate_close_time = readl(&tas->fw_shadow_list->win_end_time_list[idx]);
>> +
>> +			writel(gate_close_time,
>> +			       &tas->fw_shadow_list->gate_close_time_list[idx][gate_idx]);
>> +		}
>> +	}
>> +
>> +	/* tell f/w to swap active & shadow list */
>> +	ret = tas_set_trigger_list_change(emac);
>> +	if (ret) {
>> +		netdev_err(emac->ndev, "failed to swap f/w config list: %d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	/* Wait for completion */
>> +	ret = readb_poll_timeout(&tas->config_list->config_change, val, !val,
>> +				 USEC_PER_MSEC, 10 * USEC_PER_MSEC);
>> +	if (ret) {
>> +		netdev_err(emac->ndev, "TAS list change completion time out\n");
>> +		return ret;
>> +	}
>> +
>> +	tas_update_fw_list_pointers(emac);
>> +
>> +	return 0;
>> +}
>> +
>> +static void emac_cp_taprio(struct tc_taprio_qopt_offload *from,
>> +			   struct tc_taprio_qopt_offload *to)
>> +{
>> +	int i;
>> +
>> +	*to = *from;
>> +	for (i = 0; i < from->num_entries; i++)
>> +		to->entries[i] = from->entries[i];
>> +}
>> +
>> +static int emac_taprio_replace(struct net_device *ndev,
>> +			       struct tc_taprio_qopt_offload *taprio)
>> +{
>> +	struct prueth_emac *emac = netdev_priv(ndev);
>> +	struct tc_taprio_qopt_offload *est_new;
>> +	int ret, idx;
>> +
>> +	if (taprio->cycle_time_extension) {
>> +		netdev_err(ndev, "Failed to set cycle time extension");
> 
> "failed" implies that the driver tried to do something, but it didn't work.
> Maybe "Cycle time extension not supported"? Also, NL_SET_ERR_MSG_MOD().
> 
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	if (taprio->num_entries == 0 ||
> 
> Cannot happen. See sch_taprio.c:
> 
> 	if (new_admin->num_entries == 0) {
> 		NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule");
> 		err = -EINVAL;
> 		goto free_sched;
> 	}
> 
>> +	    taprio->num_entries > TAS_MAX_CMD_LISTS) {
>> +		NL_SET_ERR_MSG_FMT_MOD(taprio->extack, "unsupported num_entries %ld in taprio config\n",
>> +				       taprio->num_entries);
> 
> Better to say what's the hardware maximum, than to report back num_entries
> as being not supported.
> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* If any time_interval is 0 in between the list, then exit */
>> +	for (idx = 0; idx < taprio->num_entries; idx++) {
>> +		if (taprio->entries[idx].interval == 0) {
>> +			NL_SET_ERR_MSG_MOD(taprio->extack, "0 interval in taprio config not supported\n");
>> +			return -EINVAL;
>> +		}
>> +	}
> 
> Also not possible, see fill_sched_entry()
> 
> 	/* The interval should allow at least the minimum ethernet
> 	 * frame to go out.
> 	 */
> 	if (interval < min_duration) {
> 		NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> 		return -EINVAL;
> 	}
> 
>> +
>> +	if (emac->qos.tas.taprio_admin)
>> +		devm_kfree(&ndev->dev, emac->qos.tas.taprio_admin);
>> +
>> +	est_new = devm_kzalloc(&ndev->dev,
>> +			       struct_size(est_new, entries, taprio->num_entries),
>> +			       GFP_KERNEL);
>> +	if (!est_new)
>> +		return -ENOMEM;
> 
> You don't need to allocate est_new, you can keep a reference on taprio
> using taprio_offload_get(), and call taprio_offload_put() to release the
> reference when you're done with it.

taprio_offload_free() actually.

Also need fix Kconfig dependecy so this driver depends on NET_SCH_TAPRIO.

> 
>> +
>> +	emac_cp_taprio(taprio, est_new);
>> +	emac->qos.tas.taprio_admin = est_new;
>> +	ret = tas_update_oper_list(emac);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret =  tas_set_state(emac, TAS_STATE_ENABLE);
>> +	if (ret)
>> +		devm_kfree(&ndev->dev, est_new);
>> +
>> +	return ret;
>> +}

-- 
cheers,
-roger

Powered by blists - more mailing lists