netdev - Re: [RFC 6/9] staging: dpaa2-switch: add .ndo_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20201105155150.qc44olbqyxihislh@skbuf>
Date:   Thu, 5 Nov 2020 17:51:50 +0200
From:   Ioana Ciornei <ciorneiioana@...il.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Ioana Ciornei <ciorneiioana@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        Ioana Ciornei <ioana.ciornei@....com>
Subject: Re: [RFC 6/9] staging: dpaa2-switch: add .ndo_start_xmit() callback

On Thu, Nov 05, 2020 at 02:45:12PM +0100, Andrew Lunn wrote:
> > > Where is the TX confirm which uses this stored pointer. I don't see it
> > > in this file.
> > > 
> > 
> > The Tx confirm - dpaa2_switch_tx_conf() - is added in patch 5/9.
> 
> Not so obvious. Could it be moved here?
> 

Sure, I'll move it here so that we have both Tx and Tx confirmation in
the same patch.

> > > It can be expensive to store pointer like this in buffers used for
> > > DMA.
> > 
> > Yes, it is. But the hardware does not give us any other indication that
> > a packet was actually sent so that we can move ahead with consuming the
> > initial skb.
> > 
> > > It has to be flushed out of the cache here as part of the
> > > send. Then the TX complete needs to invalidate and then read it back
> > > into the cache. Or you use coherent memory which is just slow.
> > > 
> > > It can be cheaper to keep a parallel ring in cacheable memory which
> > > never gets flushed.
> > 
> > I'm afraid I don't really understand your suggestion. In this parallel
> > ring I would keep the skb pointers of all frames which are in-flight?
> > Then, when a packet is received on the Tx confirmation queue I would
> > have to loop over the parallel ring and determine somehow which skb was
> > this packet initially associated to. Isn't this even more expensive?
> 
> I don't know this particular hardware, so i will talk in general
> terms. Generally, you have a transmit ring. You add new frames to be
> sent to the beginning of the ring, and you take off completed frames
> from the end of the ring. This is kept in 'expensive' memory, in that
> either it is coherent, or you need to do flushed/invalidates.
> 
> It is expected that the hardware keeps to ring order. It does not pick
> and choose which frames it sends, it does them in order. That means
> completion also happens in ring order. So the driver can keep a simple
> linear array the size of the ring, in cachable memory, with pointers
> to the skbuf. And it just needs a counting index to know which one
> just completed.

I agree with all of the above in a general sense.

> 
> Now, your hardware is more complex. You have one queue feeding
> multiple switch ports.

Not really. I have one Tx queue for each switch port and just one Tx
confirmation queue for all of them.

> Maybe it does not keep to ring order?

If the driver enqueues frames #1, #2, #3 in this exact order on a switch
port then the frames will arrive in the same order on the Tx
confirmation queue irrespective of any other traffic sent on other
switch ports.

> If you
> have one port running at 10M/Half, and another at 10G/Full, does it
> leave frames for the 10/Half port in the ring when its egress queue it
> full? That is probably a bad idea, since the 10G/Full port could then
> starve for lack of free slots in the ring? So my guess would be, the
> frames get dropped. And so ring order is maintained.
> 
> If you are paranoid it could get out of sync, keep an array of tuples,
> address of the frame descriptor and the skbuf. If the fd address does
> not match what you expect, then do the linear search of the fd
> address, and increment a counter that something odd has happened.
> 

The problem with this would be, I think, with two TX softirqs on two
different cores which want to send a frame on the same switch port. In
order to update the shadow ring, there should be some kind of locking
mechanism on the access to the shadow ring which would might invalidate
any attempt to make this more efficient.

This might not be a problem for the dpaa2-switch since it does not
enable NETIF_F_LLTX but it might be for dpaa2-eth.

Also, as the architecture is defined now, the driver does not really see
the Tx queues as being fixed-size so that it can infer the size for the
shadow copy.

I will have to dig a little bit more in this area to understand exactly
why the decision to use skb backpointers was made in the first place (I
am not really talking about the dpaa2-switch here, dpaa2-eth has the
same exact behavior and has been around for some time now).

Ioana