[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250910172251.072a8d36@kernel.org>
Date: Wed, 10 Sep 2025 17:22:51 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Łukasz Majewski <lukasz.majewski@...lbox.org>
Cc: Andrew Lunn <andrew+netdev@...n.ch>, davem@...emloft.net, Eric Dumazet
<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Rob Herring
<robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
<conor+dt@...nel.org>, Shawn Guo <shawnguo@...nel.org>, Sascha Hauer
<s.hauer@...gutronix.de>, Pengutronix Kernel Team <kernel@...gutronix.de>,
Fabio Estevam <festevam@...il.com>, Richard Cochran
<richardcochran@...il.com>, netdev@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
imx@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org, Stefan Wahren
<wahrenst@....net>, Simon Horman <horms@...nel.org>
Subject: Re: [net-next v19 4/7] net: mtip: Add net_device_ops functions to
the L2 switch driver
On Wed, 10 Sep 2025 23:15:52 +0200 Łukasz Majewski wrote:
> > > I do use skb = buld_skb() which, "builds" the SKB around the memory
> > > page (from pool).
> > >
> > > Then, I "pass" this data (and swap it) to upper layer of the network
> > > stack.
> > >
> > > The same approach is used in the fec_main.c driver:
> > > https://elixir.bootlin.com/linux/v6.17-rc3/source/drivers/net/ethernet/freescale/fec_main.c#L1853
> > >
> >
> > I probably cut out too much context. I think I was quoting from Tx,
> > indeed on Rx this is not an issue.
>
> Ok. No adjustments needed then. Good :)
No, you were talking about build_skb() which is Rx.
This is the patch that adds Tx. Tx is wrong.
You can't modify the skb unless you call skb_cow().
Or just copy the data out to your local buffer.
> > > You may have port == 1 || port == 2 when you receive packet from
> > > ingres ports.
> > > You may also have port == 0xFF when you first time encounter the SA
> > > on the port and port == 0 when you send/receive data from the "host"
> > > interface.
> > >
> > > When port 1/2 is "detected" then the net dev for this particular
> > > port is used. In other cases the one for NAPI is used (which is one
> > > of those two - please see comment above).
> > >
> > > This was the approach from original NXP (Freescale) driver. It in
> > > some way prevents from "starvation" from net devices when L2 switch
> > > is disabled and I need to provide port separation.
> > >
> > > (port separation in fact is achieved by programming L2 switch
> > > registers and is realized in HW).
> >
> > But what if we have mixed traffic from port 1 and 2?
> > Does the current code correctly Rx the packets from port 1 on the
> > netdev from port 1 and packets from port 2 on the netdev from port 2?
>
> Yes, it does.
>
> In the mtip_rx_napi() you have call to mtip_switch_rx() which accepts
> pointer to port variable.
>
> It sets it according to the information provided by the switch IP block
> HW and also adjust the skb's ndev.
>
> I'm just wondering if the snippet from mtip_rx_napi():
> -------8<--------
> if ((port == 1 || port == 2) && fep->ndev[port - 1]
> mtip_switch_tx(fep->ndev[port - 1]);
> else
> mtip_switch_tx(napi->dev);
> ------->8--------
>
> could be replaced just with mtip_switch_tx(napi->dev);
> as TX via napi->dev shall be forward to both ports if required.
>
> I will check if this can be done in such a way.
Not napi->dev. You have to attribute sent packets to the right netdev.
> > > As I said - we do have only ONE queue, which corresponds to uDMA0
> > > when the switch is enabled. This single queue is responsible for
> > > handling transmission for both ports (this is how the HW is
> > > designed).
> >
> > Right but kernel has two SW queues.
>
> You mean a separate SW queues for each devices? This is not supported
> in the MTIP L2 switch driver. Maybe such high level SW queues
> management is available in the upper layers?
Not possible, each netdev has it's own private qdisc tree.
> > Which can be independently
> > stopped.
>
> It supports separate RX and TX HW queues (i.e. ring buffers for
> descriptors) for the uDMA0 when switch is enabled.
>
> When you want to send data (no matter from which lan[01] device) the
> same mtip_start_xmit() is called, the HW TX descriptor is setup and is
> passed via uDMA0 to L2 switch IP block.
>
> For next TX transmission (even from different port) we assign another
> descriptor from the ring buffer.
>
> > So my concerns is that for example port 1 is very busy so
> > the queue is full of packets for port 1, port 1's netdev's queue gets
> > stopped. Then port 2 tries to Tx, queue is shared, and is full, so
> > netdev 2's SW queue is also stopped. Then we complete the packets,
> > because packets were for port 1 we wake the queue for port 1. But
> > port 2 also got stopped, even tho it never put a packet on the ring..
> >
>
> As fair as I can tell - both ports call mtip_start_xmit(), their data
> is serialized to the TX queue (via descriptors).
>
> Queued descriptors are sent always at some point (or overridden if
> critical error encountered).
>
> > Long story short I think you need to always stop and start queues from
> > both netdevs.. There's just 2 so not too bad of a hack.
>
> Maybe there are some peculiarities in for example bridge code (or upper
> network stack layers in general), but I think, that I don't need any
> extra "queue" management for TX code of MTIP L2 switch.
I think I explained this enough times. Next version is v20.
If it's not significantly better than this one, I'm going to have
to ask you to stop posting this driver.
Powered by blists - more mailing lists