lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170215102821.12f419e7@cakuba.netronome.com>
Date:   Wed, 15 Feb 2017 10:28:21 -0800
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Or Gerlitz <gerlitz.or@...il.com>
Cc:     David Miller <davem@...emloft.net>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...lanox.com>,
        John Fastabend <john.r.fastabend@...el.com>,
        Roi Dayan <roid@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Hadar Hen Zion <hadarh@...lanox.com>,
        Amir Vadai <amir@...ai.me>, Ido Schimmel <idosch@...lanox.com>
Subject: Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status

On Wed, 15 Feb 2017 18:19:17 +0200, Or Gerlitz wrote:
> On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz <ogerlitz@...lanox.com> wrote:
> > Currently there is no way of querying whether a filter is
> > offloaded to HW or not when using "both" policy (where none
> > of skip_sw or skip_hw flags are set by user-space).  
> 
> > Add two new flags, "in hw" and "not in hw" such that user
> > space can determine if a filter is actually offloaded to
> > hw or not. The "in hw" UAPI semantics was chosen so it's
> > similar to the "skip hw" flag logic.  
> 
> To make things a bit more clear, the semantics of the "in hw"
> thing relates to the time of dumping the rule.
> 
> Currently in all of the offloading drivers/cases, when the driver returns
> success for the ndo_setup_tc call, the flow is offloaded to hw.
> 
> But moving fwd that might change, a flow might be not offloaded to HW
> on some window of time.
> 
> The coming up example, is support for neigh updates w.r.t to IP tunnel
> encapsulation offloads in mlx5 SRIOV switchdev mode.
> 
> Today we offload tunnel encap flow only if the kernel has valid neigh
> to the tunnel destination. Under the works is a code to offload/un-offload
> the flow to/from HW when the neigh becomes valid/invalid or goes through
> hardware address change etc.
> 
> So what I basically suggest here is to enhance that future mlx5 series
> with patches
> under which the dump code of all the classifiers will invoke the
> tc_setup_ndo with
> a fourth sub command (today there are add/del/stats) which will return
> the actual
> "in hw" status.
> 
> This is aligned with the general architecture/approach in the kernel
> for switchdev and other
> offloads.
> 
> Note that this future change doesn't change the UAPI, it will still
> have two values, "in hw"
> and "not in hw". The values with this series are the actual values and
> later with that change,
> they will keep being the actual values, just the kernel method to
> retrieve them will be different.

Thanks for explaining this, I was actually hoping to take this in
slightly different direction.

What worries me is that the moment we started offloading packet
modification we run at the risk of modifying packets twice.  This used
to be a problem only for eBPF but now mlx5 can also offload things like
ttl decrement.  For filters which have no skip_* flags set and get
offloaded if packet doesn't get redirected or modified significantly it
will match the filter both in HW and on the host and therefore have the
actions applied twice.  And it will get counted twice.  (It seems nobody
ever raised this so perhaps I'm mistaken in thinking that this can
happen?)

Back to your patch set, I was hoping we will be able to use the new
IN_HW flag to skip filters in software even if they don't have skip_sw
set.  If we need to eject actions from HW based on external events, that
obviously complicates things.  Three trivial solutions to the problem
I could think of are:
 - pass a pointer to filter's flags down to the driver and let it do
   atomic bitops? on it to set/clear the IN_HW status;
 - add a way of driver calling back into TC;
 - use one of recently freed skb TC bits to mark packets which were
   supposed to be processed in HW by could not as needing software
   fallback (I think this could work for you without parsing the packet
   in the driver, you could replace the tunnel action with mark action
   and leave the matching rule in HW classifier/TCAM; for BPF I have a
   descriptor flag telling me if offloaded BPF completed successfully).

FWIW this patch set looks good to me!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ