linux-kernel - Re: [PATCH V2 net-next] net: fec: add CBS offload support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb2a599f-4a7e-1755-fbcd-56e57abe80be@intel.com>
Date:   Tue, 14 Feb 2023 17:49:13 +0100
From:   Alexander Lobakin <alexandr.lobakin@...el.com>
To:     Wei Fang <wei.fang@....com>
CC:     Shenwei Wang <shenwei.wang@....com>,
        Clark Wang <xiaoning.wang@....com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "simon.horman@...igine.com" <simon.horman@...igine.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        dl-linux-imx <linux-imx@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support

From: Wei Fang <wei.fang@....com>
Date: Tue, 14 Feb 2023 09:34:09 +0000

>> -----Original Message-----
>> From: Alexander Lobakin <alexandr.lobakin@...el.com>
>> Sent: 2023年2月14日 0:05
>> To: Wei Fang <wei.fang@....com>
>> Cc: Shenwei Wang <shenwei.wang@....com>; Clark Wang
>> <xiaoning.wang@....com>; davem@...emloft.net; edumazet@...gle.com;
>> kuba@...nel.org; pabeni@...hat.com; simon.horman@...igine.com;
>> andrew@...n.ch; netdev@...r.kernel.org; dl-linux-imx <linux-imx@....com>;
>> linux-kernel@...r.kernel.org
>> Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support
>>
>> From: Wei Fang <wei.fang@....com>
>> Date: Mon, 13 Feb 2023 17:29:12 +0800
>>
>>> From: Wei Fang <wei.fang@....com>
>>>
>>> The FEC hardware supports the Credit-based shaper (CBS) which control
>>> the bandwidth distribution between normal traffic and time-sensitive
>>> traffic with respect to the total link bandwidth available.
>>> But notice that the bandwidth allocation of hardware is restricted to
>>> certain values. Below is the equation which is used to calculate the
>>> BW (bandwidth) fraction for per class:
>>> 	BW fraction = 1 / (1 + 512 / idle_slope)
>>
>> [...]
>>
>>> @@ -571,6 +575,12 @@ struct fec_stop_mode_gpr {
>>>  	u8 bit;
>>>  };
>>>
>>> +struct fec_cbs_params {
>>> +	bool enable[FEC_ENET_MAX_TX_QS];
>>
>> Maybe smth like
>>
>> 	DECLARE_BITMAP(enabled, FEC_ENET_MAX_TX_QS);
>>
>> ?
> I think it's a matter of personal habit.

It's a matter that you can fit 32 bits if you declare it as u32 or 64
bits if as a bitmap vs you waste 1 byte per 1 true/false flag as you do
in this version :D

> 
>>
>>> +	int idleslope[FEC_ENET_MAX_TX_QS];
>>> +	int sendslope[FEC_ENET_MAX_TX_QS];
>>
>> Can they actually be negative? (probably I'll see it below)
>>
> idleslope and sendslope are used to save the parameters passed in from user space.
> And the sendslope should always be negative.

Parameters coming from userspace must be validated before saving them
anywhere.
Also, if sendslope is always negative as you say, then just negate it
when you read it from userspace and store as u32.

> 
>>> +};
>>> +
>>>  /* The FEC buffer descriptors track the ring buffers.  The rx_bd_base and
>>>   * tx_bd_base always point to the base of the buffer descriptors.  The
>>>   * cur_rx and cur_tx point to the currently available buffer.
>>> @@ -679,6 +689,9 @@ struct fec_enet_private {
>>>  	/* XDP BPF Program */
>>>  	struct bpf_prog *xdp_prog;
>>>
>>> +	/* CBS parameters */
>>> +	struct fec_cbs_params cbs;
>>> +
>>>  	u64 ethtool_stats[];
>>>  };
>>>
>>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>>> b/drivers/net/ethernet/freescale/fec_main.c
>>> index c73e25f8995e..91394ad05121 100644
>>> --- a/drivers/net/ethernet/freescale/fec_main.c
>>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>>> @@ -66,6 +66,7 @@
>>>  #include <linux/mfd/syscon.h>
>>>  #include <linux/regmap.h>
>>>  #include <soc/imx/cpuidle.h>
>>> +#include <net/pkt_sched.h>
>>
>> Some alphabetic order? At least for new files :D
>>
> I just want to keep the reverse Christmas tree style, although the whole #include
> order is already out of the style.

RCT is applied to variable declaration inside functions, not to header
include block :D

> 
>>>  #include <linux/filter.h>
>>>  #include <linux/bpf.h>
>>>
>>> @@ -1023,6 +1024,174 @@ static void fec_enet_reset_skb(struct
>> net_device *ndev)
>>>  	}
>>>  }
>>>
>>> +static u32 fec_enet_get_idle_slope(u8 bw)
>>
>> Just use `u32`, usually there's no reason to use types shorter than integer on
>> the stack.
>>
>>> +{
>>> +	int msb, power;
>>> +	u32 idle_slope;
>>> +
>>> +	if (bw >= 100)
>>> +		return 0;
>>> +
>>> +	/* Convert bw to hardware idle slope */
>>> +	idle_slope = (512 * bw) / (100 - bw);
>>> +
>>
>> Redundant newline. Also first pair of braces is optional and up to you.
>>
> I will fix this nit, thanks!
> 
>>> +	if (idle_slope >= 128) {
>>> +		/* For values greater than or equal to 128, idle_slope
>>> +		 * rounded to the nearest multiple of 128.
>>> +		 */
>>
>> But you can just do
>>
>> 	idle_slope = min(idle_slope, 128U);
>>
>> and still use the "standard" path below, without the conditional return?
>> Or even combine it with the code above:
>>
>> 	idle_slope = min((512 * bw) / (100 - bw), 128U);
>>
> If idles_slope is greater than or equal to 128, idle_slope should be rounded to the nearest
> multiple of 128. For example, if idle_slope = 255, it should be set to 256. However 
> min(idle_slope, 128U) is not as expected.

So you say that for for >= 128 it must be a multiple of 128 and for <
128 it must be pow-2? Then I did misread your code a bit, sorry.
But then my propo regarding adding round_closest() applies here as well.

> 
>>> +		idle_slope = DIV_ROUND_CLOSEST(idle_slope, 128U) * 128U;
>>> +
>>> +		return idle_slope;
>>> +	}
>>> +
>>> +	/* For values less than 128, idle_slope is rounded to
>>> +	 * nearst power of 2.
>>
>> Typo, 'nearest'.
>>
> Yes, I'll fix it.
> 
>>> +	 */
>>> +	if (idle_slope <= 1)
>>> +		return 1;
>>> +
>>> +	msb = __fls(idle_slope);
>>
>> I think `fls() - 1` is preferred over `__fls()` since it may go UB. And some checks
>> wouldn't hurt.
>>
> I used fls() at first, but later changed to __fls. Now that you pointed out its possible
> risks, I'll change it back. Thanks.
> 
>>> +	power = BIT(msb);
>>
>> Oh okay. Then rounddown_pow_of_two() is what you're looking for.
>>
>> 	power = rounddown_pow_of_two(idle_slope);
>>
>> Or even just use one variable, @idle_slope.
>>
> Thanks for the reminder, I think I should use roundup_pow_of_two().

But your code does what rounddown_pow_of_two() does, not roundup.
Imagine that you have 0b1111, then your code will turn it into
0b1000, not 0b10000. Or am I missing something?

> 
>>> +	idle_slope = DIV_ROUND_CLOSEST(idle_slope, power) * power;
>>> +
>>> +	return idle_slope;
>>
>> You can return DIV_ROUND_ ... right away, without assignning first.
>> Also, I'm thinking of that this might be a generic helper. We have
>> roundup() and rounddown(), this could be something like "round_closest()"?
>>
>>> +}
>>> +
>>> +static void fec_enet_set_cbs_idle_slope(struct fec_enet_private *fep)
>>> +{
>>> +	u32 bw, val, idle_slope;
>>> +	int speed = fep->speed;
>>> +	int idle_slope_sum = 0;
>>> +	int i;
>>
>> Can any of them be negative?
>>
> No.

So u32 for them.

> 
>>> +
>>> +	if (!speed)
>>> +		return;
>>> +
>>> +	for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
>>
>> So you don't use the zeroth array elements at all? Why having them then?
>>
> Yes, the zeroth indicates queue 0, and queue 0 only support Non-AVB traffic.
> Why having them then?
> Firstly I think it more clear that the i indicates the index of queue. Secondly,
> it is for more convenience. If you think it is inappropriate, I will amend it
> later.

Well, it's not recommended to allocate some space you will never use.

> 
>>> +		int port_tx_rate;
>>
>> (same for type)
>>
>>> +
>>> +		/* As defined in IEEE 802.1Q-2014 Section 8.6.8.2 item:
>>> +		 *       sendslope = idleslope -  port_tx_rate
>>> +		 * So we need to check whether port_tx_rate is equal to
>>> +		 * the current link rate.
>>
>> [...]
>>
>>> +	for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
>>> +		bw = fep->cbs.idleslope[i] / (speed * 10);
>>> +		idle_slope = fec_enet_get_idle_slope(bw);
>>> +
>>> +		val = readl(fep->hwp + FEC_DMA_CFG(i));
>>> +		val &= ~IDLE_SLOPE_MASK;
>>> +		val |= idle_slope & IDLE_SLOPE_MASK;
>>
>> u32_replace_bits() will do it for you.
>>
> Sorry, I can't find the definition of u32_replace_bits().

Because Elixir doesn't index functions generated via macros. See the end
of <linux/bitfield.h>, this is where the whole family gets defined.

> 
>>> +		writel(val, fep->hwp + FEC_DMA_CFG(i));
>>> +	}
>>> +
>>> +	/* Enable Credit-based shaper. */
>>> +	val = readl(fep->hwp + FEC_QOS_SCHEME);
>>> +	val &= ~FEC_QOS_TX_SHEME_MASK;
>>> +	val |= CREDIT_BASED_SCHEME;
>>
>> (same)
>>
>>> +	writel(val, fep->hwp + FEC_QOS_SCHEME); }
>>> +
>>> +static int fec_enet_setup_tc_cbs(struct net_device *ndev, void
>>> +*type_data) {
>>> +	struct fec_enet_private *fep = netdev_priv(ndev);
>>> +	struct tc_cbs_qopt_offload *cbs = type_data;
>>> +	int queue = cbs->queue;
>>> +	int speed = fep->speed;
>>> +	int queue2;
>>
>> (types)
>>
>>> +
>>> +	if (!(fep->quirks & FEC_QUIRK_HAS_AVB))
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	/* Queue 1 for Class A, Queue 2 for Class B, so the ENET must
>>> +	 * have three queues.
>>> +	 */
>>> +	if (fep->num_tx_queues != FEC_ENET_MAX_TX_QS)
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	if (!speed) {
>>> +		netdev_err(ndev, "Link speed is 0!\n");
>>
>> ??? Is this possible? If so, why is it checked only here and why can it be
>> possible?
>>
> It's possible if the board bootup without cable.

Then it shouldn't be an error -- no link partner is a regular flow, not
something horrendous.

> 
>>> +		return -ECANCELED;
>>
>> (already mentioned in other review)
>>
>>> +	}
>>> +
>>> +	/* Queue 0 is not AVB capable */
>>> +	if (queue <= 0 || queue >= fep->num_tx_queues) {
>>
>> Is `< 0` possible? I realize it's `s32`, just curious.
>>
> If the wrong parameter is entered and the app does not check the value,
> It might be 0. I'm not sure, just in case.

Ah okay, so it's a check for userspace sanity. Then it should stay.

> 
>>> +		netdev_err(ndev, "The queue: %d is invalid!\n", queue);
>>
>> Maybe less emotions? There's no point in having `!` at the end of every error.
>>
> OK, it fine to remove '!'.
> 
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +	if (!cbs->enable) {
>>> +		u32 val;
>>> +
>>> +		val = readl(fep->hwp + FEC_QOS_SCHEME);
>>> +		val &= ~FEC_QOS_TX_SHEME_MASK;
>>> +		val |= ROUND_ROBIN_SCHEME;
>>
>> (u32_replace_bits())
>>
>>> +		writel(val, fep->hwp + FEC_QOS_SCHEME);
>>> +
>>> +		memset(&fep->cbs, 0, sizeof(fep->cbs));
>>> +
>>> +		return 0;
>>> +	}
>>> +
>>> +	if (cbs->idleslope - cbs->sendslope != speed * 1000 ||
>>> +	    cbs->idleslope <= 0 || cbs->sendslope >= 0)
>>
>> So you check slopes here, why check them above then?
>>
> Because this function will be invoked due to some events in
> fec_restart too, so I added these checks here. If you think it
> is redundant, I will move these checks to fec_restart. Thanks.

Maybe you could make a small static inline and use it where appropriate
instead of open-coding?

> 
> 
>>> +		return -EINVAL;
>>> +
>>> +	/* Another AVB queue */
>>> +	queue2 = (queue == 1) ? 2 : 1;
>>
>> Braces are unneeded.
>>
>>> +	if (cbs->idleslope + fep->cbs.idleslope[queue2] > speed * 1000) {
>>> +		netdev_err(ndev,
>>> +			   "The sum of all idle slope can't exceed link speed!\n");
>>> +		return -EINVAL;
>>> +	}
>> [...]
>>
>> Thanks,
>> Olek
Thanks,
Olek