lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1570058072-12004-1-git-send-email-john.hurley@netronome.com>
Date:   Thu,  3 Oct 2019 00:14:30 +0100
From:   John Hurley <john.hurley@...ronome.com>
To:     vladbu@...lanox.com
Cc:     jiri@...lanox.com, netdev@...r.kernel.org,
        simon.horman@...ronome.com, jakub.kicinski@...ronome.com,
        oss-drivers@...ronome.com, John Hurley <john.hurley@...ronome.com>
Subject: [RFC net-next 0/2] prevent sync issues with hw offload of flower

Hi,

Putting this out an RFC built on net-next. It fixes some issues
discovered in testing when using the TC API of OvS to generate flower
rules and subsequently offloading them to HW. Rules seen contain the same
match fields or may be rule modifications run as a delete plus an add.
We're seeing race conditions whereby the rules present in kernel flower
are out of sync with those offloaded. Note that there are some issues
that will need fixed in the RFC before it becomes a patch such as
potential races between releasing locks and re-taking them. However, I'm
putting this out for comments or potential alternative solutions.

The main cause of the races seem to be in the chain table of cls_api. If
a tcf_proto is destroyed then it is removed from its chain. If a new
filter is then added to the same chain with the same priority and protocol
a new tcf_proto will be created - this may happen before the first is
fully removed and the hw offload message sent to the driver. In cls_flower
this means that the fl_ht_insert_unique() function can pass as its
hashtable is associated with the tcf_proto. We are then in a position
where the 'delete' and the 'add' are in a race to get offloaded. We also
noticed that doing an offload add, then checking if a tcf_proto is
concurrently deleting, then remove the offload if it is, can extend the
out of order messages. Drivers do not expect to get duplicate rules.
However, the kernel TC datapath they are not duplicates so we can get out
of sync here.

The RFC fixes this by adding a pre_destroy hook to cls_api that is called
when a tcf_proto is signaled to be destroyed but before it is removed from
its chain (which is essentially the lock for allowing duplicates in
flower). Flower then uses this new hook to send the hw delete messages
from tcf_proto destroys, preventing them racing with duplicate adds. It
also moves the check for 'deleting' to before the sending the hw add
message.

John Hurley (2):
  net: sched: add tp_op for pre_destroy
  net: sched: fix tp destroy race conditions in flower

 include/net/sch_generic.h |  3 +++
 net/sched/cls_api.c       | 29 ++++++++++++++++++++++++-
 net/sched/cls_flower.c    | 55 ++++++++++++++++++++++++++---------------------
 3 files changed, 61 insertions(+), 26 deletions(-)

-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ