netdev - RE: [PATCH V2 net-next] net: fec: add CBS offload support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB9PR04MB8106414C19433AB6B13369BF88A09@DB9PR04MB8106.eurprd04.prod.outlook.com>
Date:   Thu, 16 Feb 2023 13:03:37 +0000
From:   Wei Fang <wei.fang@....com>
To:     Alexander Lobakin <alexandr.lobakin@...el.com>
CC:     Shenwei Wang <shenwei.wang@....com>,
        Clark Wang <xiaoning.wang@....com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "simon.horman@...igine.com" <simon.horman@...igine.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        dl-linux-imx <linux-imx@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH V2 net-next] net: fec: add CBS offload support


> -----Original Message-----
> From: Alexander Lobakin <alexandr.lobakin@...el.com>
> Sent: 2023年2月15日 0:49
> To: Wei Fang <wei.fang@....com>
> Cc: Shenwei Wang <shenwei.wang@....com>; Clark Wang
> <xiaoning.wang@....com>; davem@...emloft.net; edumazet@...gle.com;
> kuba@...nel.org; pabeni@...hat.com; simon.horman@...igine.com;
> andrew@...n.ch; netdev@...r.kernel.org; dl-linux-imx <linux-imx@....com>;
> linux-kernel@...r.kernel.org
> Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support
> 
> From: Wei Fang <wei.fang@....com>
> Date: Tue, 14 Feb 2023 09:34:09 +0000
> 
> >> -----Original Message-----
> >> From: Alexander Lobakin <alexandr.lobakin@...el.com>
> >> Sent: 2023年2月14日 0:05
> >> To: Wei Fang <wei.fang@....com>
> >> Cc: Shenwei Wang <shenwei.wang@....com>; Clark Wang
> >> <xiaoning.wang@....com>; davem@...emloft.net;
> edumazet@...gle.com;
> >> kuba@...nel.org; pabeni@...hat.com; simon.horman@...igine.com;
> >> andrew@...n.ch; netdev@...r.kernel.org; dl-linux-imx
> >> <linux-imx@....com>; linux-kernel@...r.kernel.org
> >> Subject: Re: [PATCH V2 net-next] net: fec: add CBS offload support
> >>
> >> From: Wei Fang <wei.fang@....com>
> >> Date: Mon, 13 Feb 2023 17:29:12 +0800
> >>
> >>> From: Wei Fang <wei.fang@....com>
> >>>
> >>
> >>> @@ -571,6 +575,12 @@ struct fec_stop_mode_gpr {
> >>>  	u8 bit;
> >>>  };
> >>>
> >>> +struct fec_cbs_params {
> >>> +	bool enable[FEC_ENET_MAX_TX_QS];
> >>
> >> Maybe smth like
> >>
> >> 	DECLARE_BITMAP(enabled, FEC_ENET_MAX_TX_QS);
> >>
> >> ?
> > I think it's a matter of personal habit.
> 
> It's a matter that you can fit 32 bits if you declare it as u32 or 64 bits if as a
> bitmap vs you waste 1 byte per 1 true/false flag as you do in this version :D
> 
Okay, I'll optimize this part.

> >
> >>
> >>> +	int idleslope[FEC_ENET_MAX_TX_QS];
> >>> +	int sendslope[FEC_ENET_MAX_TX_QS];
> >>
> >> Can they actually be negative? (probably I'll see it below)
> >>
> > idleslope and sendslope are used to save the parameters passed in from user
> space.
> > And the sendslope should always be negative.
> 
> Parameters coming from userspace must be validated before saving them
> anywhere.
> Also, if sendslope is always negative as you say, then just negate it when you
> read it from userspace and store as u32.
Sorry, I still don't understand why u32 is necessary to store parameters from user
space? In addition, people who understand 802.1Qav may be confused when they
see that sendslope is a u32 type.

> 
> >
> >>> +};
> >>> +
> >>>  /* The FEC buffer descriptors track the ring buffers.  The rx_bd_base and
> >>>   * tx_bd_base always point to the base of the buffer descriptors.  The
> >>>   * cur_rx and cur_tx point to the currently available buffer.
> >>> @@ -679,6 +689,9 @@ struct fec_enet_private {
> >>>  	/* XDP BPF Program */
> >>>  	struct bpf_prog *xdp_prog;
> >>>
> >>> +	/* CBS parameters */
> >>> +	struct fec_cbs_params cbs;
> >>> +
> >>>  	u64 ethtool_stats[];
> >>>  };
> >>>
> >>> @@ -66,6 +66,7 @@
> >>>  #include <linux/mfd/syscon.h>
> >>>  #include <linux/regmap.h>
> >>>  #include <soc/imx/cpuidle.h>
> >>> +#include <net/pkt_sched.h>
> >>
> >> Some alphabetic order? At least for new files :D
> >>
> > I just want to keep the reverse Christmas tree style, although the
> > whole #include order is already out of the style.
> 
> RCT is applied to variable declaration inside functions, not to header include
> block :D
It seems that I misunderstood before, I'll fix it.

> 
> >
> >>>  #include <linux/filter.h>
> >>>  #include <linux/bpf.h>
> >>>
> >>> @@ -1023,6 +1024,174 @@ static void fec_enet_reset_skb(struct
> >> net_device *ndev)
> >>>  	}
> >>>  }
> >>>
> >>> +static u32 fec_enet_get_idle_slope(u8 bw)
> >>
> >> Just use `u32`, usually there's no reason to use types shorter than
> >> integer on the stack.
> >>
> >>> +{
> >>> +	int msb, power;
> >>> +	u32 idle_slope;
> >>> +
> >>> +	if (bw >= 100)
> >>> +		return 0;
> >>> +
> >>> +	/* Convert bw to hardware idle slope */
> >>> +	idle_slope = (512 * bw) / (100 - bw);
> >>> +
> >>
> >> Redundant newline. Also first pair of braces is optional and up to you.
> >>
> > I will fix this nit, thanks!
> >
> >>> +	if (idle_slope >= 128) {
> >>> +		/* For values greater than or equal to 128, idle_slope
> >>> +		 * rounded to the nearest multiple of 128.
> >>> +		 */
> >>
> >> But you can just do
> >>
> >> 	idle_slope = min(idle_slope, 128U);
> >>
> >> and still use the "standard" path below, without the conditional return?
> >> Or even combine it with the code above:
> >>
> >> 	idle_slope = min((512 * bw) / (100 - bw), 128U);
> >>
> > If idles_slope is greater than or equal to 128, idle_slope should be
> > rounded to the nearest multiple of 128. For example, if idle_slope =
> > 255, it should be set to 256. However min(idle_slope, 128U) is not as
> expected.
> 
> So you say that for for >= 128 it must be a multiple of 128 and for <
> 128 it must be pow-2? Then I did misread your code a bit, sorry.
> But then my propo regarding adding round_closest() applies here as well.
> 
yes.

> >> I think `fls() - 1` is preferred over `__fls()` since it may go UB.
> >> And some checks wouldn't hurt.
> >>
> > I used fls() at first, but later changed to __fls. Now that you
> > pointed out its possible risks, I'll change it back. Thanks.
> >
> >>> +	power = BIT(msb);
> >>
> >> Oh okay. Then rounddown_pow_of_two() is what you're looking for.
> >>
> >> 	power = rounddown_pow_of_two(idle_slope);
> >>
> >> Or even just use one variable, @idle_slope.
> >>
> > Thanks for the reminder, I think I should use roundup_pow_of_two().
> 
> But your code does what rounddown_pow_of_two() does, not roundup.
> Imagine that you have 0b1111, then your code will turn it into 0b1000, not
> 0b10000. Or am I missing something?
> 
0b1111 is nearest to 0b10000, so it should be turned into 0x10000.

> >
> >>> +	idle_slope = DIV_ROUND_CLOSEST(idle_slope, power) * power;
> >>> +
> >>> +	return idle_slope;
> >>
> >> You can return DIV_ROUND_ ... right away, without assignning first.
> >> Also, I'm thinking of that this might be a generic helper. We have
> >> roundup() and rounddown(), this could be something like "round_closest()"?
> >>
> >>> +}
> >>> +
> >>> +static void fec_enet_set_cbs_idle_slope(struct fec_enet_private
> >>> +*fep) {
> >>> +	u32 bw, val, idle_slope;
> >>> +	int speed = fep->speed;
> >>> +	int idle_slope_sum = 0;
> >>> +	int i;
> >>
> >> Can any of them be negative?
> >>
> > No.
> 
> So u32 for them.
> 
> >
> >>> +
> >>> +	if (!speed)
> >>> +		return;
> >>> +
> >>> +	for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
> >>
> >> So you don't use the zeroth array elements at all? Why having them then?
> >>
> > Yes, the zeroth indicates queue 0, and queue 0 only support Non-AVB traffic.
> > Why having them then?
> > Firstly I think it more clear that the i indicates the index of queue.
> > Secondly, it is for more convenience. If you think it is
> > inappropriate, I will amend it later.
> 
> Well, it's not recommended to allocate some space you will never use.
> 
Okay, I'll fix it, thanks.
> >
> >>> +		int port_tx_rate;
> >>
> >> (same for type)
> >>
> >>> +
> >>> +		/* As defined in IEEE 802.1Q-2014 Section 8.6.8.2 item:
> >>> +		 *       sendslope = idleslope -  port_tx_rate
> >>> +		 * So we need to check whether port_tx_rate is equal to
> >>> +		 * the current link rate.
> >>
> >> [...]
> >>
> >>> +	for (i = 1; i < FEC_ENET_MAX_TX_QS; i++) {
> >>> +		bw = fep->cbs.idleslope[i] / (speed * 10);
> >>> +		idle_slope = fec_enet_get_idle_slope(bw);
> >>> +
> >>> +		val = readl(fep->hwp + FEC_DMA_CFG(i));
> >>> +		val &= ~IDLE_SLOPE_MASK;
> >>> +		val |= idle_slope & IDLE_SLOPE_MASK;
> >>
> >> u32_replace_bits() will do it for you.
> >>
> > Sorry, I can't find the definition of u32_replace_bits().
> 
> Because Elixir doesn't index functions generated via macros. See the end of
> <linux/bitfield.h>, this is where the whole family gets defined.
> 
Thanks, I got it.

> >
> >>> +		writel(val, fep->hwp + FEC_DMA_CFG(i));
> >>> +	}
> >>> +
> >>> +	/* Enable Credit-based shaper. */
> >>> +	val = readl(fep->hwp + FEC_QOS_SCHEME);
> >>> +	val &= ~FEC_QOS_TX_SHEME_MASK;
> >>> +	val |= CREDIT_BASED_SCHEME;
> >>
> >> (same)
> >>
> >>> +	writel(val, fep->hwp + FEC_QOS_SCHEME); }
> >>> +
> >>> +static int fec_enet_setup_tc_cbs(struct net_device *ndev, void
> >>> +*type_data) {
> >>> +	struct fec_enet_private *fep = netdev_priv(ndev);
> >>> +	struct tc_cbs_qopt_offload *cbs = type_data;
> >>> +	int queue = cbs->queue;
> >>> +	int speed = fep->speed;
> >>> +	int queue2;
> >>
> >> (types)
> >>
> >>> +
> >>> +	if (!(fep->quirks & FEC_QUIRK_HAS_AVB))
> >>> +		return -EOPNOTSUPP;
> >>> +
> >>> +	/* Queue 1 for Class A, Queue 2 for Class B, so the ENET must
> >>> +	 * have three queues.
> >>> +	 */
> >>> +	if (fep->num_tx_queues != FEC_ENET_MAX_TX_QS)
> >>> +		return -EOPNOTSUPP;
> >>> +
> >>> +	if (!speed) {
> >>> +		netdev_err(ndev, "Link speed is 0!\n");
> >>
> >> ??? Is this possible? If so, why is it checked only here and why can
> >> it be possible?
> >>
> > It's possible if the board bootup without cable.
> 
> Then it shouldn't be an error -- no link partner is a regular flow, not something
> horrendous.
yes, I'll fix it.
> 
> >
> >>> +		return -ECANCELED;
> >>
> >> (already mentioned in other review)
> >>
> >>> +	}
> >>> +
> >>> +	/* Queue 0 is not AVB capable */
> >>> +	if (queue <= 0 || queue >= fep->num_tx_queues) {
> >>
> >> Is `< 0` possible? I realize it's `s32`, just curious.
> >>
> > If the wrong parameter is entered and the app does not check the
> > value, It might be 0. I'm not sure, just in case.
> 
> Ah okay, so it's a check for userspace sanity. Then it should stay.
> 
> >
> >>> +		netdev_err(ndev, "The queue: %d is invalid!\n", queue);
> >>
> >> Maybe less emotions? There's no point in having `!` at the end of every
> error.
> >>
> > OK, it fine to remove '!'.
> >
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +	if (!cbs->enable) {
> >>> +		u32 val;
> >>> +
> >>> +		val = readl(fep->hwp + FEC_QOS_SCHEME);
> >>> +		val &= ~FEC_QOS_TX_SHEME_MASK;
> >>> +		val |= ROUND_ROBIN_SCHEME;
> >>
> >> (u32_replace_bits())
> >>
> >>> +		writel(val, fep->hwp + FEC_QOS_SCHEME);
> >>> +
> >>> +		memset(&fep->cbs, 0, sizeof(fep->cbs));
> >>> +
> >>> +		return 0;
> >>> +	}
> >>> +
> >>> +	if (cbs->idleslope - cbs->sendslope != speed * 1000 ||
> >>> +	    cbs->idleslope <= 0 || cbs->sendslope >= 0)
> >>
> >> So you check slopes here, why check them above then?
> >>
> > Because this function will be invoked due to some events in
> > fec_restart too, so I added these checks here. If you think it is
> > redundant, I will move these checks to fec_restart. Thanks.
> 
> Maybe you could make a small static inline and use it where appropriate
> instead of open-coding?
> 
That is a good suggestion, thanks.
> >
> >
> >>> +		return -EINVAL;
> >>> +
> >>> +	/* Another AVB queue */
> >>> +	queue2 = (queue == 1) ? 2 : 1;
> >>
> >> Braces are unneeded.
> >>
> >>> +	if (cbs->idleslope + fep->cbs.idleslope[queue2] > speed * 1000) {
> >>> +		netdev_err(ndev,
> >>> +			   "The sum of all idle slope can't exceed link speed!\n");
> >>> +		return -EINVAL;
> >>> +	}
> >> [...]
> >>
> >> Thanks,
> >> Olek
> Thanks,
> Olek