[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250911235547.477460e4@wsk>
Date: Thu, 11 Sep 2025 23:55:47 +0200
From: Łukasz Majewski <lukasz.majewski@...lbox.org>
To: Jakub Kicinski <kuba@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>
Cc: davem@...emloft.net, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni
 <pabeni@...hat.com>, Rob Herring <robh@...nel.org>, Krzysztof Kozlowski
 <krzk+dt@...nel.org>, Conor Dooley <conor+dt@...nel.org>, Shawn Guo
 <shawnguo@...nel.org>, Sascha Hauer <s.hauer@...gutronix.de>, Pengutronix
 Kernel Team <kernel@...gutronix.de>, Fabio Estevam <festevam@...il.com>,
 Richard Cochran <richardcochran@...il.com>, netdev@...r.kernel.org,
 devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
 imx@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org, Stefan Wahren
 <wahrenst@....net>, Simon Horman <horms@...nel.org>, Vladimir Oltean
 <vladimir.oltean@....com>
Subject: Re: [net-next v19 4/7] net: mtip: Add net_device_ops functions to
 the L2 switch driver
Hi Jakub,
> On Wed, 10 Sep 2025 23:15:52 +0200 Łukasz Majewski wrote:
> > > > I do use skb = buld_skb() which, "builds" the SKB around the
> > > > memory page (from pool).
> > > > 
> > > > Then, I "pass" this data (and swap it) to upper layer of the
> > > > network stack.
> > > > 
> > > > The same approach is used in the fec_main.c driver:
> > > > https://elixir.bootlin.com/linux/v6.17-rc3/source/drivers/net/ethernet/freescale/fec_main.c#L1853
> > > >      
> > > 
> > > I probably cut out too much context. I think I was quoting from
> > > Tx, indeed on Rx this is not an issue.    
> > 
> > Ok. No adjustments needed then. Good :)  
> 
> No, you were talking about build_skb() which is Rx.
> This is the patch that adds Tx. Tx is wrong.
The same approach is taken in fec_main.c (@ fec_enet_txq_submit_skb()
function).
> You can't modify the skb unless you call skb_cow().
> Or just copy the data out to your local buffer.
In the mtip_start_xmit_port() I do assign the address to skb->data to
bufaddr pointer.
If alignment is wrong the we copy it to bounce buffer.
Then we do swap the buffer if needed.
Last step is to dma map this memory and assign the pointer to the
descriptor for L2 switch transmission.
> 
> > > > You may have port == 1 || port == 2 when you receive packet from
> > > > ingres ports.
> > > > You may also have port == 0xFF when you first time encounter
> > > > the SA on the port and port == 0 when you send/receive data
> > > > from the "host" interface.
> > > > 
> > > > When port 1/2 is "detected" then the net dev for this particular
> > > > port is used. In other cases the one for NAPI is used (which is
> > > > one of those two - please see comment above).
> > > > 
> > > > This was the approach from original NXP (Freescale) driver. It
> > > > in some way prevents from "starvation" from net devices when L2
> > > > switch is disabled and I need to provide port separation.
> > > > 
> > > > (port separation in fact is achieved by programming L2 switch
> > > > registers and is realized in HW).      
> > > 
> > > But what if we have mixed traffic from port 1 and 2?
> > > Does the current code correctly Rx the packets from port 1 on the
> > > netdev from port 1 and packets from port 2 on the netdev from
> > > port 2?    
> > 
> > Yes, it does.
> > 
> > In the mtip_rx_napi() you have call to mtip_switch_rx() which
> > accepts pointer to port variable.
> > 
> > It sets it according to the information provided by the switch IP
> > block HW and also adjust the skb's ndev.
> > 
> > I'm just wondering if the snippet from mtip_rx_napi():
> > -------8<--------
> > if ((port == 1 || port == 2) && fep->ndev[port - 1]
> > 	mtip_switch_tx(fep->ndev[port - 1]);
> > else
> > 	mtip_switch_tx(napi->dev);  
> > ------->8--------    
> > 
> > could be replaced just with mtip_switch_tx(napi->dev);
> > as TX via napi->dev shall be forward to both ports if required.
> > 
> > I will check if this can be done in such a way.  
> 
> Not napi->dev. You have to attribute sent packets to the right netdev.
And then we do have some issue to solve. To be more specific -
fec_main.c to avoid starvation just from fec_enet_rx_napi() calls
fec_enet_tx() with only one net device (which it supports).
I wanted to mimic such behaviour with L2 switch driver (at
mtip_rx_napi()), but then the question - which network device (from
available two) shall be assigned?
The net device passed to mtip_switch_tx() is only relevant for
"housekeeping/statistical data" as in fact we just provide another
descriptor to the HW to be sent.
Maybe I shall extract the net device pointer from the skb structure?
> 
> > > > As I said - we do have only ONE queue, which corresponds to
> > > > uDMA0 when the switch is enabled. This single queue is
> > > > responsible for handling transmission for both ports (this is
> > > > how the HW is designed).      
> > > 
> > > Right but kernel has two SW queues.    
> > 
> > You mean a separate SW queues for each devices? This is not
> > supported in the MTIP L2 switch driver. Maybe such high level SW
> > queues management is available in the upper layers?  
> 
> Not possible, each netdev has it's own private qdisc tree.
Please correct me if I'm wrong, but aren't packets from those queues
end up with calling ->ndo_start_xmit() function?
> 
> > > Which can be independently
> > > stopped.    
> > 
> > It supports separate RX and TX HW queues (i.e. ring buffers for
> > descriptors) for the uDMA0 when switch is enabled.
> > 
> > When you want to send data (no matter from which lan[01] device) the
> > same mtip_start_xmit() is called, the HW TX descriptor is setup and
> > is passed via uDMA0 to L2 switch IP block.
> > 
> > For next TX transmission (even from different port) we assign
> > another descriptor from the ring buffer.
> >   
> > > So my concerns is that for example port 1 is very busy so
> > > the queue is full of packets for port 1, port 1's netdev's queue
> > > gets stopped. Then port 2 tries to Tx, queue is shared, and is
> > > full, so netdev 2's SW queue is also stopped. Then we complete
> > > the packets, because packets were for port 1 we wake the queue
> > > for port 1. But port 2 also got stopped, even tho it never put a
> > > packet on the ring.. 
> > 
> > As fair as I can tell - both ports call mtip_start_xmit(), their
> > data is serialized to the TX queue (via descriptors).
> > 
> > Queued descriptors are sent always at some point (or overridden if
> > critical error encountered).
> >   
> > > Long story short I think you need to always stop and start queues
> > > from both netdevs.. There's just 2 so not too bad of a hack.    
> > 
> > Maybe there are some peculiarities in for example bridge code (or
> > upper network stack layers in general), but I think, that I don't
> > need any extra "queue" management for TX code of MTIP L2 switch.  
> 
> I think I explained this enough times. Next version is v20.
> If it's not significantly better than this one, I'm going to have 
> to ask you to stop posting this driver.
I don't know how to reply to this comment, really. 
I've spent many hours of my spare time to upstream this driver.
I'm just disappointed (and maybe I will not say more because of high
level of my frustration).
Could you point me to the driver example which provides such queues
management for switchdev driver? Just to show what you expect from me.
One example.
-- 
Best regards,
Łukasz Majewski
Powered by blists - more mailing lists
 
