[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UcRQEeDPOkXGyy3yoqmO5hh8fdK+1wCu8+hyjmSKPuOCw@mail.gmail.com>
Date: Fri, 2 Mar 2018 15:24:06 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jakub Kicinski <kubakici@...pl>
Cc: Edward Cree <ecree@...arflare.com>,
linux-net-drivers@...arflare.com,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
"John W. Linville" <linville@...driver.com>,
Or Gerlitz <gerlitz.or@...il.com>,
Alexander Duyck <alexander.h.duyck@...el.com>
Subject: Re: [PATCH RESEND net-next 0/2] ntuple filters with RSS
On Fri, Mar 2, 2018 at 10:55 AM, Jakub Kicinski <kubakici@...pl> wrote:
> On Fri, 2 Mar 2018 15:24:29 +0000, Edward Cree wrote:
>> On Tue, Feb 27, 2018 at 3:47 PM, Jakub Kicinski <kubakici@...pl> wrote:
>>
>> > Please, let's stop extending ethtool_rx_flow APIs. I bit my tongue
>> > when Intel was adding their "redirection to VF" based on ethtool ntuples
>> > and look now they're adding the same functionality with flower :| And
>> > wonder how to handle two interfaces doing the same thing.
>> Since sfc only supports ethtool NFC interfaces (we have no flower support,
>> and I also wonder how one is to support both of those interfaces without
>> producing an ugly mess), I'd much rather put this in ethtool than have to
>> implement all of flower just so we can have this extension.
>
> "Just this one extension" is exactly the attitude that can lead to
> messy APIs :(
>
>> I guess part of the question is, which other drivers besides us would want
>> to implement something like this, and what are their requirements?
>
> I think every vendor is trying to come up with ways to make their HW
> work with containers better these days.
>
>> > On the use case itself, I wonder how much sense that makes. Can your
>> > hardware not tag the packet as well so you could then mux it to
>> > something like macvlan offload?
>> In practice the only way our hardware can "tag the packet" is by the
>> selection of RX queue. So you could for instance give a container its
>> own RX queues (rather than just using the existing RX queues on the
>> appropriate CPUs), and maybe in future hook those queues up to l2fwd
>> offload somehow.
>> But that seems like a separate job (offloading the macvlan switching) to
>> what this series is about (making the RX processing happen on the right
>> CPUs). Is software macvlan switching really noticeably slow, anyway?
>
> OK, thanks for clarifying.
>
>> Besides, more powerful filtering than just MAC addr might be needed, if,
>> for instance, the container network is encapsulated. In that case
>> something like a UDP 4-tuple filter might be necessary (or, indeed, a
>> filter looking at the VNID (VxLAN TNI) - which our hardware can do but
>> ethtool doesn't currently have a way to specify). AFAICT l2-fwd-offload
>> can only be used for straight MAC addr, not for overlay networks like
>> VxLAN or FOU? At least, existing ndo_dfwd_add_station() implementations
>> don't seem to check that dev is a macvlan... Does it even support
>> VLAN filters? fm10k implementation doesn't seem to.
>
> Exactly! One can come up with many protocol combinations which flower
> already has APIs for... ethtool is not the place for it.
>
>> Anyway, like I say, filtering traffic onto its own queues seems to be
>> orthogonal, or at least separate, to binding those queues into an
>> upperdev for demux offload.
>
> It is, I was just trying to broaden the scope to more capable HW so we
> design APIs that would serve all.
>
>> On 28/02/18 01:24, Alexander Duyck wrote:
>>
>> > We did something like this for i40e. Basically we required creating
>> > the queue groups using mqprio to keep them symmetric on Tx and Rx, and
>> > then allowed for TC ingress filters to redirect traffic to those queue
>> > groups.
>> >
>> > - Alex
>> If we're not doing macvlan offload, I'm not sure what, if anything, the
>> TX side would buy us. So for now it seems to make sense for TX just to
>> use the TXQ associated with the CPU from which the TX originates, which
>> I believe already happens automatically.
>
> I don't think that's what Alex was referring to. Please see
> commit e284fc280473 ("i40e: Add and delete cloud filter") for
> instance :)
Right. And as far as the Tx queue association goes right now we are
basing things off of skb->priority which is easily controlled via
cgroups. So in theory you could associate a given set of cgroup to a
specific set of Tx queues using this approach.
Most of the filtering that Jakub pointed out is applied to the Rx side
to make sure the packets come in on the right queue set.
- Alex
Powered by blists - more mailing lists