lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2FAUuOh4jrA0uGu@lore-desk>
Date: Tue, 17 Dec 2024 10:11:46 +0100
From: Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Lorenzo Bianconi <lorenzo@...nel.org>,
	Oleksij Rempel <linux@...pel-privat.de>, netdev@...r.kernel.org,
	andrew@...n.ch, davem@...emloft.net, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, nbd@....name,
	sean.wang@...iatek.com, Mark-MC.Lee@...iatek.com,
	lorenzo.bianconi83@...il.com
Subject: Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha
 EN7581 SoC

> On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > > ndo_setup_tc_conduit() does not have the same instruments to offload
> > > what port_setup_tc() can offload. It is not involved in all data paths
> > > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> > 
> > Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> > ->port_setup_tc()) refer to the same DSA user port (please take a look the
> > callback signature).
> 
> I'd be just repeating what I've said a few times before. Your proposed
> ndo_setup_tc_conduit() appears to be configuring conduit resources
> (QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
> exclusive control of the user port data path. But as long as there are
> packets in the user port data path that bypass those conduit QoS resources
> (like for example mt7530 switch forwards packet from one port to another,
> bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
> about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
> first place. The tc command you're trying to make to do what you want is
> supposed to _also_ include the mt7530 packets forwarded from one port to
> another in its QoS mix. It applies at the _egress_ of the mt7530 port.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
> 
> Let me try to add some squiggles based on your diagram, to clarify what
> is my understanding and complaint.
> 
>                ┌───────┐                                   ┌───────┐
>                │ QDMA2 │                                   │ QDMA1 │
>                └───┬───┘                                   └───┬───┘
>                    │                                           │
>            ┌───────▼───────────────────────────────────────────▼────────┐
>            │                                                            │
>            │       P5                                          P0       │
>            │                                                            │
>            │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P3 ├────► GDM3 │
>            │                                                            │
> ┌─────┐    │                                                            │
> │ PPE ◄────┤ P4                        PSE                              │
> └─────┘    │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P9 ├────► GDM4 │
>            │                                                            │    └──────┘
>            │                                                            │
>            │        P2                                         P1       │
>            └─────────┬─────────────────────────────────────────┬────────┘
>                      │                                         │
>                  ┌───▼──┐                                   ┌──▼───┐
>                  │ GDM2 │                                   │ GDM1 │
>                  └──────┘                                   └──┬───┘
>                                                                │
>                                                 ┌──────────────▼───────────────┐
>                                                 │            CPU port          │
>                                                 │   ┌─────────┘                │
>                                                 │   │         MT7530           │
>                                                 │   │                          │
>                                                 │   ▼         x                │
>                                                 │   ┌─────┐ ┌─┘                │
>                                                 │  lan1  lan2  lan3  lan4      │
>                                                 └───│──────────────────────────┘
>                                                     ▼
> 
> When you add an offloaded Qdisc to the egress of lan1, the expectation
> is that packets from lan2 obey it too (offloaded tc goes hand in hand
> with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> breaking that expectation, because packets from lan2 bridged by MT7530
> don't go to GDM1 (the "x").

ack, I got your point. I was assuming to cover this case (traffic from lan2 to
lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
not ok, I guess we will need to revisit the approach.

> 
> > > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > > paths in the switch, such that all packets that egress the DSA user port
> > > are handled by ndo_setup_tc_conduit()'s instruments.
> > 
> > Uhm, do you mean we are changing the user expected result in this way?
> > It seems to me the only case we are actually changing is if port_setup_tc()
> > callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> > one is supported by the mac chip. In this case the previous implementation
> > returns -EOPNOTSUPP while the proposed one does not report any error.
> > Do we really care about this case? If so, I guess we can rework
> > dsa_user_setup_qdisc().
> 
> See above, there's nothing to rework.
> 
> > > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > > traffic is routed from the WAN port to a given DSA switch port.
> > > 
> > > And I don't believe it should, in this way.
> > 
> > Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> > for this.
> 
> See above, I'm also waiting for Oleksij's answer but I don't expect you
> 2 to be talking about the same thing. If there's some common infrastructure
> to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().

ack

> 
> > > We need something as the root Qdisc of the conduit which exposes its
> > > hardware capabilities. I just assumed that would be a simple (and single)
> > > ETS, you can correct me if I am wrong.
> > > 
> > > On conduit egress, what is the arbitration scheme between the traffic
> > > destined towards each DSA user port (channel, as the driver calls them)?
> > > How can this be best represented?
> > 
> > The EN7581 supports up to 32 different 'channels' (each of them support 8
> > different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> > My idea is to associate a channel to each DSA switch port, so the user can
> > define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> > apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> > The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> > is offloaded) updates the channel and queue information in the DMA descriptor
> > (please take a look to [0] for the first case).
> 
> But you call it a MAC chip because between the GDM1 and the MT7530 there's
> an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?

With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
(what is managed by airoha_eth driver). There is no other chip in between
of GDM1 and MT7530 switch (sorry for the confusion).

> I'm asking again, are the channels completely independent of one another,
> or are they consuming shared bandwidth in a way that with your proposal
> is just not visible? If there is a GMII between the GDM1 and the MT7530,
> how come the bandwidth between the channels is not shared in any way?

Channels are logically independent.
GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
is higher than the sum of DSA port link speeds (on my development boards I have
4 DSA ports @ 1Gbps);

> And if there is no GMII or similar MAC interface, we need to take 100
> steps back and discuss why was the DSA model chosen for this switch, and
> not a freeform switchdev driver where the conduit is not discrete?
> 
> I'm not sure what to associate these channels with. Would it be wrong to
> think of each channel as a separate DSA conduit? Because for example there
> is API to customize the user port <-> conduit assignment.
> 
> > > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > > as if they can be perfectly virtualized among DSA user ports, and as if
> > > each DSA user port can have its own ETS root Qdisc, completely independent
> > > of each other, as if the packets do not serialize on the conduit <-> CPU
> > > port link, and as if that is not a bottleneck. Is that really the case?
> > 
> > correct
> 
> Very interesting but will need more than one word for an explanation :)

I mean your paragraph above well describes hw architecture.

> 
> > > If so (but please explain how), maybe you really need your own root Qdisc
> > > driver, with one class per DSA user port, and those classes have ETS
> > > attached to them.
> > 
> > Can you please clarify what do you mean with 'root Qdisc driver'?
> 
> Quite literally, an implementer of struct Qdisc_ops whose parent can
> only be TC_H_ROOT. I was implying you'd have to create an abstract
> software model of the QoS capabilities of the QDMA and the GDM port such
> that we all understand them, a netlink attribute scheme for configuring
> those QoS parameters, and then a QoS offload mechanism through which
> they are communicated to compatible hardware. But let's leave that aside
> until it becomes more clear what you have.
> 

ack, fine.

Regards,
Lorenzo

[0] https://github.com/openwrt/openwrt/blob/main/target/linux/mediatek/files-6.6/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L1531

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ