lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241216231311.odozs4eki7bbagwp@skbuf>
Date: Tue, 17 Dec 2024 01:13:11 +0200
From: Vladimir Oltean <olteanv@...il.com>
To: Lorenzo Bianconi <lorenzo@...nel.org>
Cc: Oleksij Rempel <linux@...pel-privat.de>, netdev@...r.kernel.org,
	andrew@...n.ch, davem@...emloft.net, edumazet@...gle.com,
	kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, nbd@....name,
	sean.wang@...iatek.com, Mark-MC.Lee@...iatek.com,
	lorenzo.bianconi83@...il.com
Subject: Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha
 EN7581 SoC

On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > ndo_setup_tc_conduit() does not have the same instruments to offload
> > what port_setup_tc() can offload. It is not involved in all data paths
> > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> 
> Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> ->port_setup_tc()) refer to the same DSA user port (please take a look the
> callback signature).

I'd be just repeating what I've said a few times before. Your proposed
ndo_setup_tc_conduit() appears to be configuring conduit resources
(QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
exclusive control of the user port data path. But as long as there are
packets in the user port data path that bypass those conduit QoS resources
(like for example mt7530 switch forwards packet from one port to another,
bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
first place. The tc command you're trying to make to do what you want is
supposed to _also_ include the mt7530 packets forwarded from one port to
another in its QoS mix. It applies at the _egress_ of the mt7530 port.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752

Let me try to add some squiggles based on your diagram, to clarify what
is my understanding and complaint.

               ┌───────┐                                   ┌───────┐
               │ QDMA2 │                                   │ QDMA1 │
               └───┬───┘                                   └───┬───┘
                   │                                           │
           ┌───────▼───────────────────────────────────────────▼────────┐
           │                                                            │
           │       P5                                          P0       │
           │                                                            │
           │                                                            │
           │                                                            │    ┌──────┐
           │                                                         P3 ├────► GDM3 │
           │                                                            │
┌─────┐    │                                                            │
│ PPE ◄────┤ P4                        PSE                              │
└─────┘    │                                                            │
           │                                                            │    ┌──────┐
           │                                                         P9 ├────► GDM4 │
           │                                                            │    └──────┘
           │                                                            │
           │        P2                                         P1       │
           └─────────┬─────────────────────────────────────────┬────────┘
                     │                                         │
                 ┌───▼──┐                                   ┌──▼───┐
                 │ GDM2 │                                   │ GDM1 │
                 └──────┘                                   └──┬───┘
                                                               │
                                                ┌──────────────▼───────────────┐
                                                │            CPU port          │
                                                │   ┌─────────┘                │
                                                │   │         MT7530           │
                                                │   │                          │
                                                │   ▼         x                │
                                                │   ┌─────┐ ┌─┘                │
                                                │  lan1  lan2  lan3  lan4      │
                                                └───│──────────────────────────┘
                                                    ▼

When you add an offloaded Qdisc to the egress of lan1, the expectation
is that packets from lan2 obey it too (offloaded tc goes hand in hand
with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
breaking that expectation, because packets from lan2 bridged by MT7530
don't go to GDM1 (the "x").

> > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > paths in the switch, such that all packets that egress the DSA user port
> > are handled by ndo_setup_tc_conduit()'s instruments.
> 
> Uhm, do you mean we are changing the user expected result in this way?
> It seems to me the only case we are actually changing is if port_setup_tc()
> callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> one is supported by the mac chip. In this case the previous implementation
> returns -EOPNOTSUPP while the proposed one does not report any error.
> Do we really care about this case? If so, I guess we can rework
> dsa_user_setup_qdisc().

See above, there's nothing to rework.

> > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > traffic is routed from the WAN port to a given DSA switch port.
> > 
> > And I don't believe it should, in this way.
> 
> Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> for this.

See above, I'm also waiting for Oleksij's answer but I don't expect you
2 to be talking about the same thing. If there's some common infrastructure
to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().

> > We need something as the root Qdisc of the conduit which exposes its
> > hardware capabilities. I just assumed that would be a simple (and single)
> > ETS, you can correct me if I am wrong.
> > 
> > On conduit egress, what is the arbitration scheme between the traffic
> > destined towards each DSA user port (channel, as the driver calls them)?
> > How can this be best represented?
> 
> The EN7581 supports up to 32 different 'channels' (each of them support 8
> different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> My idea is to associate a channel to each DSA switch port, so the user can
> define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> is offloaded) updates the channel and queue information in the DMA descriptor
> (please take a look to [0] for the first case).

But you call it a MAC chip because between the GDM1 and the MT7530 there's
an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?
I'm asking again, are the channels completely independent of one another,
or are they consuming shared bandwidth in a way that with your proposal
is just not visible? If there is a GMII between the GDM1 and the MT7530,
how come the bandwidth between the channels is not shared in any way?
And if there is no GMII or similar MAC interface, we need to take 100
steps back and discuss why was the DSA model chosen for this switch, and
not a freeform switchdev driver where the conduit is not discrete?

I'm not sure what to associate these channels with. Would it be wrong to
think of each channel as a separate DSA conduit? Because for example there
is API to customize the user port <-> conduit assignment.

> > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > as if they can be perfectly virtualized among DSA user ports, and as if
> > each DSA user port can have its own ETS root Qdisc, completely independent
> > of each other, as if the packets do not serialize on the conduit <-> CPU
> > port link, and as if that is not a bottleneck. Is that really the case?
> 
> correct

Very interesting but will need more than one word for an explanation :)

> > If so (but please explain how), maybe you really need your own root Qdisc
> > driver, with one class per DSA user port, and those classes have ETS
> > attached to them.
> 
> Can you please clarify what do you mean with 'root Qdisc driver'?

Quite literally, an implementer of struct Qdisc_ops whose parent can
only be TC_H_ROOT. I was implying you'd have to create an abstract
software model of the QoS capabilities of the QDMA and the GDM port such
that we all understand them, a netlink attribute scheme for configuring
those QoS parameters, and then a QoS offload mechanism through which
they are communicated to compatible hardware. But let's leave that aside
until it becomes more clear what you have.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ