netdev - Re: [RFC net-next 0/2] prevent sync issues with hw offload of flower

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK+XE=mjARd+DodNg9Sn4C+gg6dMTmvdNrKtEYhsLGVqtGrysw@mail.gmail.com>
Date:   Thu, 3 Oct 2019 17:59:50 +0100
From:   John Hurley <john.hurley@...ronome.com>
To:     Vlad Buslov <vladbu@...lanox.com>
Cc:     Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "simon.horman@...ronome.com" <simon.horman@...ronome.com>,
        "jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
        "oss-drivers@...ronome.com" <oss-drivers@...ronome.com>
Subject: Re: [RFC net-next 0/2] prevent sync issues with hw offload of flower

On Thu, Oct 3, 2019 at 5:26 PM Vlad Buslov <vladbu@...lanox.com> wrote:
>
>
> On Thu 03 Oct 2019 at 02:14, John Hurley <john.hurley@...ronome.com> wrote:
> > Hi,
> >
> > Putting this out an RFC built on net-next. It fixes some issues
> > discovered in testing when using the TC API of OvS to generate flower
> > rules and subsequently offloading them to HW. Rules seen contain the same
> > match fields or may be rule modifications run as a delete plus an add.
> > We're seeing race conditions whereby the rules present in kernel flower
> > are out of sync with those offloaded. Note that there are some issues
> > that will need fixed in the RFC before it becomes a patch such as
> > potential races between releasing locks and re-taking them. However, I'm
> > putting this out for comments or potential alternative solutions.
> >
> > The main cause of the races seem to be in the chain table of cls_api. If
> > a tcf_proto is destroyed then it is removed from its chain. If a new
> > filter is then added to the same chain with the same priority and protocol
> > a new tcf_proto will be created - this may happen before the first is
> > fully removed and the hw offload message sent to the driver. In cls_flower
> > this means that the fl_ht_insert_unique() function can pass as its
> > hashtable is associated with the tcf_proto. We are then in a position
> > where the 'delete' and the 'add' are in a race to get offloaded. We also
> > noticed that doing an offload add, then checking if a tcf_proto is
> > concurrently deleting, then remove the offload if it is, can extend the
> > out of order messages. Drivers do not expect to get duplicate rules.
> > However, the kernel TC datapath they are not duplicates so we can get out
> > of sync here.
> >
> > The RFC fixes this by adding a pre_destroy hook to cls_api that is called
> > when a tcf_proto is signaled to be destroyed but before it is removed from
> > its chain (which is essentially the lock for allowing duplicates in
> > flower). Flower then uses this new hook to send the hw delete messages
> > from tcf_proto destroys, preventing them racing with duplicate adds. It
> > also moves the check for 'deleting' to before the sending the hw add
> > message.
> >
> > John Hurley (2):
> >   net: sched: add tp_op for pre_destroy
> >   net: sched: fix tp destroy race conditions in flower
> >
> >  include/net/sch_generic.h |  3 +++
> >  net/sched/cls_api.c       | 29 ++++++++++++++++++++++++-
> >  net/sched/cls_flower.c    | 55 ++++++++++++++++++++++++++---------------------
> >  3 files changed, 61 insertions(+), 26 deletions(-)
>
> Hi John,
>
> Thanks for working on this!
>
> Are there any other sources for race conditions described in this
> letter? When you describe tcf_proto deletion you say "main cause" but
> don't provide any others. If tcf_proto is the only problematic part,

Hi Vlad,
Thanks for the input.
The tcf_proto deletion was the cause from the tests we ran. That's not
to say there are not more I wasn't seeing in my analysis.

> then it might be worth to look into alternative ways to force concurrent
> users to wait for proto deletion/destruction to be properly finished.
> Maybe having some table that maps chain id + prio to completion would be
> simpler approach? With such infra tcf_proto_create() can wait for
> previous proto with same prio and chain to be fully destroyed (including
> offloads) before creating a new one.

I think a problem with this is that the chain removal functions call
tcf_proto_put() (which calls destroy when ref is 0) so, if other
concurrent processes (like a dump) have references to the tcf_proto
then we may not get the hw offload even by the time the chain deletion
function has finished. We would need to make sure this was tracked -
say after the tcf_proto_destroy function has completed.
How would you suggest doing the wait? With a replay flag as happens in
some other places?

To me it seems the main problem is that the tcf_proto being in a chain
almost acts like the lock to prevent duplicates filters getting to the
driver. We need some mechanism to ensure a delete has made it to HW
before we release this 'lock'.

>
> Regards,
> Vlad