netdev - Re: [RFC net-next v2 1/1] net: sched: prevent duplicate flower rules from tcf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAK+XE=kV9xodyQj0URF+9zN=fO4hf=B7VzHwewNtVLVquC4udQ@mail.gmail.com>
Date:   Fri, 1 Nov 2019 15:27:58 +0000
From:   John Hurley <john.hurley@...ronome.com>
To:     Vlad Buslov <vladbu@...lanox.com>
Cc:     Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "simon.horman@...ronome.com" <simon.horman@...ronome.com>,
        "jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
        "louis.peens@...ronome.com" <louis.peens@...ronome.com>,
        "oss-drivers@...ronome.com" <oss-drivers@...ronome.com>
Subject: Re: [RFC net-next v2 1/1] net: sched: prevent duplicate flower rules
 from tcf_proto destroy race

On Fri, Nov 1, 2019 at 3:08 PM Vlad Buslov <vladbu@...lanox.com> wrote:
>
>
> On Fri 01 Nov 2019 at 16:54, John Hurley <john.hurley@...ronome.com> wrote:
> > On Fri, Nov 1, 2019 at 1:01 PM Vlad Buslov <vladbu@...lanox.com> wrote:
> >>
> >>
> >> On Thu 31 Oct 2019 at 21:08, John Hurley <john.hurley@...ronome.com> wrote:
> >> > When a new filter is added to cls_api, the function
> >> > tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
> >> > determine if the tcf_proto is duplicated in the chain's hashtable. It then
> >> > creates a new entry or continues with an existing one. In cls_flower, this
> >> > allows the function fl_ht_insert_unque to determine if a filter is a
> >> > duplicate and reject appropriately, meaning that the duplicate will not be
> >> > passed to drivers via the offload hooks. However, when a tcf_proto is
> >> > destroyed it is removed from its chain before a hardware remove hook is
> >> > hit. This can lead to a race whereby the driver has not received the
> >> > remove message but duplicate flows can be accepted. This, in turn, can
> >> > lead to the offload driver receiving incorrect duplicate flows and out of
> >> > order add/delete messages.
> >> >
> >> > Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
> >> > hash table per block stores each unique chain/protocol/prio being
> >> > destroyed. This entry is only removed when the full destroy (and hardware
> >> > offload) has completed. If a new flow is being added with the same
> >> > identiers as a tc_proto being detroyed, then the add request is replayed
> >> > until the destroy is complete.
> >> >
> >> > v1->v2:
> >> > - put tcf_proto into block->proto_destroy_ht rather than creating new key
> >> >   (Vlad Buslov)
> >> > - add error log for cases when hash entry fails. Previously it failed
> >> >   silently with no indication. (Vlad Buslov)
> >> >
> >> > Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution")
> >> > Signed-off-by: John Hurley <john.hurley@...ronome.com>
> >> > Reviewed-by: Simon Horman <simon.horman@...ronome.com>
> >> > Reported-by: Louis Peens <louis.peens@...ronome.com>
> >> > ---
> >>
> >> Hi John,
> >>
> >> Patch looks good! However, I think we can simplify it even more and
> >> remove duplication of data in tcf_proto (hashtable key contains copy of
> >> data that is already available in the struct itself) and remove all
> >> dynamic memory allocations. I have refactored your patch accordingly
> >> (attached). WDYT?
> >>
> >
> > Hi Vlad,
> > Thanks for the review/modifications.
> > The changes look good to me but I can fire it through our test setup to be sure.
> >
> > The only thing I'm a bit concerned about here is the size of the
> > static allocation.
> > We are now defining quite a large hash table per block (65536 buckets).
> > It's hard to know exactly how many elements will be in this at the one time.
> > With flushes of large chains it may well become quite full, but I'd
> > expect that it will be empty a majority of the time.
> > Resizable hash seems a bit more appropriate here.
> > Do you have any opinions on this?
> >
> > John
>
> Yeah, I agree that 65536 buckets is quite an overkill for this. I think
> cls API assumes that there is not too many tp instances because they are
> attached to chain through regular linked list and each lookup is a
> linear search. With this I assume that proto_destroy_ht size can be
> safely reduced to 256 buckets, if not less. Resizable table is an
> option, but that also sounds like an overkill to me since linear search
> over list of chains attached to block or over list of tp's attached to
> chain will be the main bottleneck, if user creates hundreds of thousands
> of proto instances.
>

Yes, it's a valid point about the tp/chain lookup being the bottleneck
in cases of high usage.
Let's reduce the buckets to say 128 and avoid the need for rhash.
Thanks


> >
> >> [...]
> >>