netdev - Re: [PATCH net-next v2 2/3] dsa: mv88e6xxx: Add support for RMU in select switches

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 31 Aug 2022 08:12:09 +0200
From:   Mattias Forsblad <mattias.forsblad@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>,
        Florian Fainelli <f.fainelli@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v2 2/3] dsa: mv88e6xxx: Add support for RMU in
 select switches

On 2022-08-30 18:35, Vladimir Oltean wrote:
> On Fri, Aug 26, 2022 at 08:38:15AM +0200, Mattias Forsblad wrote:
>>  static int mv88e6xxx_rmu_setup(struct mv88e6xxx_chip *chip)
>>  {
>> +	if (chip->info->ops->rmu_enable)
>> +		if (!chip->info->ops->rmu_enable(chip))
>> +			return mv88e6xxx_rmu_init(chip);
>> +
>>  	if (chip->info->ops->rmu_disable)
>>  		return chip->info->ops->rmu_disable(chip);
> 
> I think it's very important for the RMU to still start as disabled.
> You enable it dynamically when the master goes up.
> 

Please elaborate why this may pose a problem, I might have missed
some information.

>> diff --git a/drivers/net/dsa/mv88e6xxx/rmu.c b/drivers/net/dsa/mv88e6xxx/rmu.c
>> new file mode 100644
>> index 000000000000..b7d850c099c5
>> --- /dev/null
>> +++ b/drivers/net/dsa/mv88e6xxx/rmu.c
>> @@ -0,0 +1,273 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * Marvell 88E6xxx Switch Remote Management Unit Support
>> + *
>> + * Copyright (c) 2022 Mattias Forsblad <mattias.forsblad@...il.com>
>> + *
>> + */
>> +
>> +#include <asm/unaligned.h>
>> +#include "rmu.h"
>> +#include "global1.h"
>> +
>> +static int mv88e6xxx_validate_port_mac(struct dsa_switch *ds, struct sk_buff *skb)
>> +{
>> +	unsigned char *ethhdr;
>> +	struct dsa_port *dp;
>> +	u8 pkt_port;
>> +
>> +	pkt_port = (skb->data[7] >> 3) & 0xf;
>> +	dp = dsa_to_port(ds, pkt_port);
>> +	if (!dp) {
>> +		dev_dbg_ratelimited(ds->dev, "RMU: couldn't find port for %d\n", pkt_port);
>> +		return -ENXIO;
>> +	}
>> +
>> +	/* Check matching MAC */
>> +	ethhdr = skb_mac_header(skb);
>> +	if (memcmp(dp->slave->dev_addr, ethhdr, ETH_ALEN)) {
> 
> ether_addr_equal()
> 

Ofc.

> Also, what happens if you don't validate the MAC DA of the response, and
> in general, if you just put your MAC SA as the MAC address of the
> operationally active RMU DSA master? I guess the whole idea is to
> provide a MAC address which the DSA master won't drop with its RX
> filter, and its own MAC address is just fine for that.
>>> +		dev_dbg_ratelimited(ds->dev, "RMU: mismatching MAC address for request. Rx %pM expecting %pM\n",
>> +				    ethhdr, dp->slave->dev_addr);
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +int mv88e6xxx_inband_rcv(struct dsa_switch *ds, struct sk_buff *skb, int seq_no)
>> +{
>> +	struct mv88e6xxx_chip *chip = ds->priv;
>> +	struct mv88e6xxx_port *port;
>> +	u16 prodnum;
>> +	u16 format;
>> +	u8 pkt_dev;
>> +	u8 pkt_prt;
>> +	u16 code;
>> +	int i;
>> +
>> +	/* Extract response data */
>> +	format = get_unaligned_be16(&skb->data[0]);
>> +	if (format != htons(MV88E6XXX_RMU_RESP_FORMAT_1) &&
>> +	    format != htons(MV88E6XXX_RMU_RESP_FORMAT_2)) {
>> +		dev_err_ratelimited(chip->dev, "RMU: received unknown format 0x%04x", format);
>> +		goto out;
>> +	}
>> +
>> +	code = get_unaligned_be16(&skb->data[4]);
>> +	if (code == ntohs(MV88E6XXX_RMU_RESP_ERROR)) {
> 
> Please build with sparse, "make W=1 C=1". These are all wrong. The
> variables retrieved from packet headers should be __be16, and "htohs"
> (network to host) should be "htons" (host to network).
> 

Will check.

>> +		dev_err_ratelimited(chip->dev, "RMU: error response code 0x%04x", code);
>> +		goto out;
>> +	}
>> +
>> +	pkt_dev = skb->data[6] & 0x1f;
>> +	if (!dsa_switch_find(ds->dst->index, pkt_dev)) {
> 
> What is the relation between the pkt_dev from here, the RMU header, and
> the source_device from dsa_inband_rcv_ll()? Are they always the same?
> Can they be different? You throw away the result from dsa_switch_find()
> in any case.
> 

Will check.

>> +		dev_err_ratelimited(chip->dev, "RMU: response from unknown chip with index %d\n",
>> +				    pkt_dev);
>> +		goto out;
>> +	}
>> +
>> +	/* Check sequence number */
>> +	if (seq_no != chip->rmu.seq_no) {
>> +		dev_err_ratelimited(chip->dev, "RMU: wrong seqno received %d, expected %d",
>> +				    seq_no, chip->rmu.seq_no);
>> +		goto out;
>> +	}
>> +
>> +	/* Check response code */
>> +	switch (chip->rmu.request_cmd) {
>> +	case MV88E6XXX_RMU_REQ_GET_ID: {
>> +		if (code == MV88E6XXX_RMU_RESP_CODE_GOT_ID) {
> 
> I'd expect htons to be used even here, with 0, for type consistency.
> The compiler should figure things out and not add extra code.
> 

Thanks.

>> +			prodnum = get_unaligned_be16(&skb->data[2]);
>> +			chip->rmu.got_id = prodnum;
>> +			dev_info_ratelimited(chip->dev, "RMU: received id OK with product number: 0x%04x\n",
>> +					     chip->rmu.got_id);
>> +		} else {
>> +			dev_dbg_ratelimited(chip->dev,
>> +					    "RMU: unknown response for GET_ID format 0x%04x code 0x%04x",
>> +					    format, code);
>> +		}
>> +		break;
>> +	}
>> +	case MV88E6XXX_RMU_REQ_DUMP_MIB:
>> +		if (code == MV88E6XXX_RMU_RESP_CODE_DUMP_MIB &&
>> +		    !mv88e6xxx_validate_port_mac(ds, skb)) {
>> +			pkt_prt = (skb->data[7] & 0x78) >> 3;
>> +			port = &chip->ports[pkt_prt];
> 
> It would be good if you could structure the code in such a way that you
> don't parse stuff from the packet twice, once in mv88e6xxx_validate_port_mac()
> and once here.
> 

Agreed.

>> +			if (!port) {
>> +				dev_err_ratelimited(chip->dev, "RMU: illegal port number in response: %d\n",
>> +						    pkt_prt);
>> +				goto out;
>> +			}
>> +
>> +			/* Copy whole array for further
>> +			 * processing according to chip type
>> +			 */
>> +			for (i = 0; i < MV88E6XXX_RMU_MAX_RMON; i++)
>> +				port->rmu_raw_stats[i] = get_unaligned_be32(&skb->data[12 + i * 4]);
>> +		}
>> +		break;
>> +	default:
>> +		dev_err_ratelimited(chip->dev,
>> +				    "RMU: unknown response format 0x%04x and code 0x%04x from chip %d\n",
>> +				    format, code, chip->ds->index);
>> +		break;
>> +	}
>> +
>> +out:
>> +	complete(&chip->rmu.completion);
>> +
>> +	return 0;
>> +}
>> +
>> +static int mv88e6xxx_rmu_tx(struct mv88e6xxx_chip *chip, int port,
>> +			    const char *msg, int len)
>> +{
>> +	const struct dsa_device_ops *tag_ops;
>> +	const struct dsa_port *dp;
>> +	unsigned char *data;
>> +	struct sk_buff *skb;
>> +
>> +	dp = dsa_to_port(chip->ds, port);
>> +	if (!dp || !dp->cpu_dp)
>> +		return 0;
>> +
>> +	tag_ops = dp->cpu_dp->tag_ops;
>> +	if (!tag_ops)
>> +		return -ENODEV;
>> +
>> +	skb = netdev_alloc_skb(chip->rmu.netdev, 64);
>> +	if (!skb)
>> +		return -ENOMEM;
>> +
>> +	skb_reserve(skb, 2 * ETH_HLEN + tag_ops->needed_headroom);
> 
> If you decide to rework this using the master netdev, you can use
> dsa_tag_protocol_overhead(master->dsa_ptr->tag_ops). Or even reserve
> enough headroom for the larger header (EDSA) and be done with it.
> But then you need to construct a different header depending on whether
> DSA or EDSA is used.
> 

So in the new version a la qca8k we need the 'extra' parameter to
see if we need space for EDSA header, thus we need run through the tagger.
We can discuss that in the next version.

>> +	skb_reset_network_header(skb);
>> +	skb->pkt_type = PACKET_OUTGOING;
> 
> Could you please explain for me what will setting skb->pkt_type to
> PACKET_OUTGOING achieve?
>

I though it was prudent, will remove if it's not needed.
 
>> +	skb->dev = chip->rmu.netdev;
>> +
>> +	/* Create RMU L3 message */
>> +	data = skb_put(skb, len);
>> +	memcpy(data, msg, len);
>> +
>> +	return tag_ops->inband_xmit(skb, dp->slave, ++chip->rmu.seq_no);
>> +}
>> +
>> +static int mv88e6xxx_rmu_send_wait(struct mv88e6xxx_chip *chip, int port,
>> +				   int request, const char *msg, int len)
>> +{
>> +	const struct dsa_port *dp;
>> +	int ret = 0;
>> +
>> +	dp = dsa_to_port(chip->ds, port);
>> +	if (!dp)
>> +		return 0;
>> +
>> +	mutex_lock(&chip->rmu.mutex);
>> +
>> +	chip->rmu.request_cmd = request;
>> +
>> +	ret = mv88e6xxx_rmu_tx(chip, port, msg, len);
>> +	if (ret == -ENODEV) {
>> +		/* Device not ready yet? Try again later */
>> +		ret = 0;
> 
> All the code paths in mv88e6xxx_rmu_tx() that return -ENODEV are
> fabricated reasons to have errors, IMO, and in a correct implementation
> we should never even get there. Please drop the special casing.
> 

Re-worked in the next version.

>> +		goto out;
>> +	}
>> +
>> +	if (ret) {
>> +		dev_dbg(chip->dev, "RMU: error transmitting request (%d)", ret);
> 
> Please have a normal log level for an error, dev_err().
> Also you can print the symbolic name for the error:
> 
> 		dev_err(chip->dev, "RMU: error transmitting request (%pe)",
> 			ERR_PTR(ret));
> 

Thanks.

>> +		goto out;
>> +	}
>> +
>> +	ret = wait_for_completion_timeout(&chip->rmu.completion,
>> +					  msecs_to_jiffies(MV88E6XXX_WAIT_POLL_TIME_MS));
>> +	if (ret == 0) {
>> +		dev_dbg(chip->dev,
>> +			"RMU: timeout waiting for request %d (%d) on dev:port %d:%d\n",
>> +			request, ret, chip->ds->index, port);
> 
> Again, please increase the log level of the error condition.
> Also, chip->ds->index is useless information, we have info about the
> switch via dev_dbg/dev_err.
> 

Ok, thanks.

>> +		ret = -ETIMEDOUT;
>> +	}
>> +
>> +out:
>> +	mutex_unlock(&chip->rmu.mutex);
>> +
>> +	return ret > 0 ? 0 : ret;
>> +}
>> +
>> +static int mv88e6xxx_rmu_get_id(struct mv88e6xxx_chip *chip, int port)
>> +{
>> +	const u8 get_id[8] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
>> +	int ret = -1;
>> +
>> +	if (chip->rmu.got_id)
>> +		return 0;
>> +
>> +	chip->rmu.netdev = dev_get_by_name(&init_net, "chan0");
> 
> What if I don't have a slave device called "chan0"? I can't use the RMU?
> 

Reworked in the next version.

>> +	if (!chip->rmu.netdev) {
>> +		dev_dbg(chip->dev, "RMU: unable to get interface");
>> +		return -ENODEV;
>> +	}
>> +
>> +	ret = mv88e6xxx_rmu_send_wait(chip, port, MV88E6XXX_RMU_REQ_GET_ID, get_id, 8);
>> +	if (ret) {
>> +		dev_dbg(chip->dev, "RMU: error for command GET_ID %d index %d\n", ret,
>> +			chip->ds->index);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int mv88e6xxx_rmu_stats_get(struct mv88e6xxx_chip *chip, int port, uint64_t *data)
>> +{
>> +	u8 dump_mib[8] = { 0x00, 0x01, 0x00, 0x00, 0x10, 0x20, 0x00, 0x00 };
>> +	int ret;
>> +
>> +	if (!chip)
>> +		return 0;
> 
> How can "chip" be NULL?
> 

Yes, Andrew pointed that out. Thanks.

>> +
>> +	ret = mv88e6xxx_rmu_get_id(chip, port);
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Send a GET_MIB command */
>> +	dump_mib[7] = port;
>> +	ret = mv88e6xxx_rmu_send_wait(chip, port, MV88E6XXX_RMU_REQ_DUMP_MIB, dump_mib, 8);
>> +	if (ret) {
>> +		dev_dbg(chip->dev, "RMU: error for command DUMP_MIB %d dev %d:%d\n", ret,
>> +			chip->ds->index, port);
>> +		return ret;
>> +	}
>> +
>> +	/* Update MIB for port */
>> +	if (chip->info->ops->stats_get_stats)
>> +		return chip->info->ops->stats_get_stats(chip, port, data);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct mv88e6xxx_bus_ops mv88e6xxx_bus_ops = {
> 
> static const struct
> 
>> +	.get_rmon = mv88e6xxx_rmu_stats_get,
>> +};
>> +
>> +int mv88e6xxx_rmu_init(struct mv88e6xxx_chip *chip)
>> +{
>> +	dev_info(chip->dev, "RMU: setting up for switch@%d", chip->sw_addr);
> 
> Not a whole lotta useful information brought by this. Also,
> chip->sw_addr is already visible via dev_info(chip->dev).
> I would just drop this.
> 

Ok.

>> +
>> +	init_completion(&chip->rmu.completion);
>> +
>> +	mutex_init(&chip->rmu.mutex);
> 
> I would just move these to mv88e6xxx_rmu_setup(), and move
> mv88e6xxx_rmu_setup() to rmu.c.
> 
>> +
>> +	chip->rmu.ops = &mv88e6xxx_bus_ops;
> 
> By doing this (combined with the way in which chip->rmu.ops is actually
> tested for), you are saying that RMU is available since driver probe time.
> But it isn't. It's only available when the master goes up and is
> operational. So you're setting yourself up for lots of I/O failures in
> the beginning.
> 

Reworked in the next version.

>> +
>> +	return 0;
>> +}
>> +
>> +void mv88e6xxx_rmu_master_change(struct dsa_switch *ds, const struct net_device *master,
>> +				 bool operational)
>> +{
>> +	struct mv88e6xxx_chip *chip = ds->priv;
>> +
>> +	if (operational)
>> +		chip->rmu.ops = &mv88e6xxx_bus_ops;
>> +	else
>> +		chip->rmu.ops = NULL;
>> +}
> 
> There is a subtle but very important point to be careful about here,
> which is compatibility with multiple CPU ports. If there is a second DSA
> master whose state flaps from up to down, this should not affect the
> fact that you can still use RMU over the first DSA master. But in your
> case it does, so this is a case of how not to write code that accounts
> for that.
> 
> In fact, given this fact, I think your function prototypes for
> chip->info->ops->rmu_enable() are all wrong / not sufficiently
> reflective of what the hardware can do. If the hardware has a bit mask
> of ports on which RMU operations are possible, why hardcode using
> dsa_switch_upstream_port() and not look at which DSA masters/CPU ports
> are actually up? At least for the top-most switch. For downstream
> switches we can use dsa_switch_upstream_port(), I guess (even that can
> be refined, but I'm not aware of setups using multiple DSA links, where
> each DSA link ultimately goes to a different upstream switch).

This is also changed a la qca8k.