netdev - Re: [PATCH net-next 1/3] flow_offload: allow user to offload tc action to net device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ba4e24f-e34e-f893-d42b-d0fd40794da5@mojatatu.com>
Date:   Fri, 30 Jul 2021 06:17:18 -0400
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Simon Horman <simon.horman@...igine.com>
Cc:     Vlad Buslov <vladbu@...dia.com>,
        David Miller <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...lanox.com>, netdev@...r.kernel.org,
        oss-drivers@...igine.com, Baowen Zheng <baowen.zheng@...igine.com>,
        Louis Peens <louis.peens@...igine.com>,
        Ido Schimmel <idosch@...dia.com>,
        Jiri Pirko <jiri@...nulli.us>, Roopa Prabhu <roopa@...dia.com>
Subject: Re: [PATCH net-next 1/3] flow_offload: allow user to offload tc
 action to net device

On 2021-07-28 10:46 a.m., Simon Horman wrote:
> On Wed, Jul 28, 2021 at 09:51:00AM -0400, Jamal Hadi Salim wrote:
>> On 2021-07-28 3:46 a.m., Simon Horman wrote:
>>> On Tue, Jul 27, 2021 at 07:47:43PM +0300, Vlad Buslov wrote:
>>>> On Tue 27 Jul 2021 at 19:13, Jamal Hadi Salim <jhs@...atatu.com> wrote:
>>>>> On 2021-07-27 10:38 a.m., Vlad Buslov wrote:
>>>>>> On Tue 27 Jul 2021 at 16:04, Simon Horman <simon.horman@...igine.com> wrote:
>>
>> [..]
>>
>>>>>> I think we have the same issue with filters - they might not be in
>>>>>> hardware after driver callback returned "success" (due to neigh state
>>>>>> being invalid for tunnel_key encap, for example).
>>>>>
>>>>> Sounds like we need another state for this. Otherwise, how do you debug
>>>>> that something is sitting in the driver and not in hardware after you
>>>>> issued a command to offload it? How do i tell today?
>>>>> Also knowing reason why something is sitting in the driver would be
>>>>> helpful.
>>>>
>>>> It is not about just adding another state. The issue is that there is no
>>>> way for drivers to change the state of software filter dynamically.
>>>
>>> I think it might be worth considering enhancing things at some point.
>>> But I agree that its more than a matter of adding an extra flag. And
>>> I think it's reasonable to implement something similar to the classifier
>>> current offload handling of IN_HW now and consider enhancements separately.
>>
>> Debugability is very important. If we have such gotchas we need to have
>> the admin at least be able to tell if the driver returns "success"
>> and the request is still sitting in the driver for whatever reason
>> At minimal there needs to be some indicator somewhere which say
>> "inprogress" or "waiting for resolution" etc.
>> If the control plane(user space app) starts making other decisions
>> based on assumptions that filter was successfully installed i.e
>> packets are being treated in the hardware then there could be
>> consequences when this assumption is wrong.
>>
>> So if i undestood the challenge correctly it is: how do you relay
>> this info back so it is reflected in the filter details. Yes that
>> would require some mechanism to exist and possibly mapping state
>> between whats in the driver and in the cls layer.
>> If i am not mistaken, the switchdev folks handle this asynchronicty?
>> +Cc Ido, Jiri, Roopa
>>
>> And it should be noted that: Yes, the filters have this
>> pre-existing condition but doesnt mean given the opportunity
>> to do actions we should replicate what they do.
> 
> I'd prefer symmetry between the use of IN_HW for filters and actions,
> which I believe is what Vlad has suggested.
> 

It still not clear to me what it means from a command line pov.
How do i add a rule and when i dump it what does it show?

> If we wish to enhance things - f.e. for debugging, which I
> agree is important - then I think that is a separate topic.
> 

My only concern is not to repeat mistakes that are in filters
just for the sake of symmetry. Example the fact that something
went wrong with insertion or insertion is still in progress
and you get an indication that all went well.
Looking at mlnx (NIC) ndrivers it does seem that in the normal case
the insertion into hw is synchronous (for anything that is not sw
only). I didnt quiet see what Vlad was referring to.
We have spent literally hours debugging issues where rules are being
offloaded thinking it was the driver so any extra info helps.


>>>>
>>>> No. How would adding more flags improve h/w update rate? I was just
>>>> thinking that it is strange that users that are not interested in
>>>> offloads would suddenly have higher memory usage for their actions just
>>>> because they happen to have offload-capable driver loaded. But it is not
>>>> a major concern for me.
>>>
>>> In that case can we rely on the global tc-offload on/off flag
>>> provided by ethtool? (I understand its not the same, but perhaps
>>> it is sufficient in practice.)
>>>
>>
>> ok.
>> So: I think i have seen this what is probably the spamming refered
>> with the intel (800?) driver ;-> Basically driver was reacting to
>> all filters regardless of need to offload or not.
>> I thought it was an oversight on their part and the driver needed
>> fixing. Are we invoking the offload regardless of whether h/w offload
>> is requested? In my naive view - at least when i looked at the intel
>> code - it didnt seem hard to avoid the spamming.
> 
> There is a per-netdev (not global as I wrote above) flag to enable and
> disable offload. And there is per-classifier skip_hw flag. I can dig
> through the code as easily as you can but I'd be surprised if the
> driver is seeing offload requests if either of those settings
> are in effect.
> 

For sure it was the 800 (Ice driver if you want to dig into the code).
I think the ethtool flag was turned on but not skip_sw in the policy. I
dont have the card installed in any board right now - either will go and
dig into the logs (because it spews the message into the logs).
Again not clear if this is what Vlad was calling spam.

>>>>> I was looking at it more as a (currently missing) feature improvement.
>>>>> We already have a use case that is implemented by s/w today. The feature
>>>>> mimics it in h/w.
>>>>>
>>>>> At minimal all existing NICs should be able to support the counters
>>>>> as mapped to simple actions like drop. I understand for example if some
>>>>> cant support adding separately offloading of tunnels for example.
>>>>> So the syntax is something along the lines of:
>>>>>
>>>>> tc actions add action drop index 15 skip_sw
>>>>> tc filter add dev ...parent ... protocol ip prio X ..\
>>>>> u32/flower skip_sw match ... flowid 1:10 action gact index 15
>>>>>
>>>>> You get an error if counter index 15 is not offloaded or
>>>>> if skip_sw was left out..
>>>>>
>>>>> And then later on, if you support sharing of actions:
>>>>> tc filter add dev ...parent ... protocol ip prio X2 ..\
>>>>> u32/flower skip_sw match ... flowid 1:10 action gact index 15
>>>
>>> Right, I understand that makes sense and is internally consistent.
>>> But I think that in practice it only makes a difference "Approach B"
>>> implementations, none of which currently exist.
>>>
>>
>> At minimal:
>> Shouldnt counters (easily correlated to basic actions like drop or
>> accept) fit the scenario of:
>> tc actions add action drop index 15 skip_sw
>> tc filter add dev ...parent ... protocol ip prio X .. \
>> u32/flower skip_sw match ... flowid 1:10 action gact index 15
>>
>> ?
>>
>>> I would suggest we can add this when the need arises, rather than
>>> speculatively without hw/driver support. Its not precluded by the current
>>> model AFAIK.
>>>
>>
>> We are going to work on a driver that would have the "B" approach.
>> I am hoping - whatever the consensus here - it doesnt require a
>> surgery afterwards to make that work.
> 
> You should be able to build on the work proposed here to add what you
> suggest into the framework to meet these requirements for your driver work.
> 

Then we are good. These are the same patches you have here?

cheers,
jamal