lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Mar 2017 11:46:00 -0800
From:   Joe Stringer <joe@....org>
To:     Andy Zhou <azhou@....org>
Cc:     netdev <netdev@...r.kernel.org>, pravin shelar <pshelar@....org>
Subject: Re: [RFC net-next sample action optimization 3/3] openvswitch:
 Optimize sample action for the clone use cases

On 7 March 2017 at 16:15, Andy Zhou <azhou@....org> wrote:
> With the introduction of open flow 'clone' action, the OVS user space
> can now translate the 'clone' action into kernel datapath 'sample'
> action, with 100% probability, to ensure that the clone semantics,
> which is that the packet seen by the clone action is the same as the
> packet seen by the action after clone, is faithfully carried out
> in the datapath.
>
> While the sample action in the datpath has the matching semantics,
> its implementation is only optimized for its original use.
> Specifically, there are two limitation: First, there is a 3 level of
> nesting restriction, enforced at the flow downloading time. This
> limit turns out to be too restrictive for the 'clone' use case.
> Second, the implementation avoid recursive call only if the sample
> action list has a single userspace action.
>
> The optimization implemented in this series removes the static
> nesting limit check, instead, implement the run time recursion limit
> check, and recursion avoidance similar to that of the 'recirc' action.
> This optimization solve both #1 and #2 issues above.
>
> Another optimization implemented is to avoid coping flow key as

*copying

> long as the actions enclosed does not change the flow key. The
> detection is performed only once at the flow downloading time.
>
> The third optimization implemented is to rewrite the action list
> at flow downloading time in order to save the fast path from parsing
> the sample action list in its original form repeatedly.

Whenever there is an enumeration (1, 2, 3; ..another..., third thing
implemented) in a commit message, I have to ask whether each "another
change..." should be a separate patch. It certainly makes it easier to
review.

I ran this through the OVS kernel tests and it's working correctly
from that point of view, but I didn't get a chance to dig in and
ensure for example, runtime behaviour of several nested
sample(actions(sample(actions(sample(actions(output)))))) handles
reasonably when it runs out of stack and deferred actions space. At a
high level though, this seems pretty straightforward.

Several comments below, thanks.

>
> Signed-off-by: Andy Zhou <azhou@....org>
> ---
>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>  net/openvswitch/datapath.h     |   7 +++
>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 91 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 259aea9..2e8c372 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>  }
>
>  static int sample(struct datapath *dp, struct sk_buff *skb,
> -                 struct sw_flow_key *key, const struct nlattr *attr,
> -                 const struct nlattr *actions, int actions_len)
> +                 struct sw_flow_key *key, const struct nlattr *attr)
>  {
> -       const struct nlattr *acts_list = NULL;
> -       const struct nlattr *a;
> -       int rem;
> -       u32 cutlen = 0;
> -
> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> -                a = nla_next(a, &rem)) {
> -               u32 probability;
> -
> -               switch (nla_type(a)) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       probability = nla_get_u32(a);
> -                       if (!probability || prandom_u32() > probability)
> -                               return 0;
> -                       break;
> -
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       acts_list = a;
> -                       break;
> -               }
> -       }
> +       struct nlattr *actions;
> +       struct nlattr *sample_arg;
> +       struct sw_flow_key *orig = key;
> +       int rem = nla_len(attr);
> +       int err = 0;
> +       const struct sample_arg *arg;
>
> -       rem = nla_len(acts_list);
> -       a = nla_data(acts_list);
> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */

Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.

> +       sample_arg = nla_data(attr);

We could do this in the parent call, like several other actions do.

<snip>

> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>                         err = execute_masked_set_action(skb, key, nla_data(a));
>                         break;
>
> -               case OVS_ACTION_ATTR_SAMPLE:
> -                       err = sample(dp, skb, key, a, attr, len);
> +               case OVS_ACTION_ATTR_SAMPLE: {
> +                       bool last = nla_is_last(a, rem);
> +                       struct sk_buff *clone_skb;
> +
> +                       clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
> +
> +                       if (!clone_skb)
> +                               /* Out of memory, skip this sample action.
> +                                */

Should we ratelimited complain in this case?

> +                               break;
> +
> +                       err = sample(dp, clone_skb, key, a);
> +                       if (last)
> +                               return err;

Given that sample() doesn't guarantee it will free 'clone_skb', I
don't think this is safe.

> +
> +                       pr_err("executing sample");

Useful for debugging, but don't forget to drop for next version.

>                         break;
> +                       }

This bracket is typically aligned with the character of the beginning
of the 'case' that it applies to.

>
>                 case OVS_ACTION_ATTR_CT:
>                         if (!is_flow_key_valid(key)) {
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 1c6e937..d7a6e803 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>         if (logging_allowed && net_ratelimit())                 \
>                 pr_info("netlink: " fmt "\n", ##__VA_ARGS__);   \
>  } while (0)
> +
> +#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)

This should be defined in include/uapi/linux/openvswitch.h, protected
by #ifdef __KERNEL__ like the other internal formats. While I don't
think it actually conflicts with any other internal action format,
these should be in one place so we don't end up accidentally using the
same enum twice.

> +struct sample_arg {
> +       bool exec;
> +       u32  probability;
> +};

Perhaps we should move this to either net/openvswitch/flow_netlink.h
or include/linux/openvswitch.h since it's only for consumption
internally by the kernel module?

> +
>  #endif /* datapath.h */
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 6f5fa50..0e7cf13 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2007-2014 Nicira, Inc.
> + * Copyright (c) 2007-2017 Nicira, Inc.
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of version 2 of the GNU General Public
> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>  #define OVS_ATTR_NESTED -1
>  #define OVS_ATTR_VARIABLE -2
>
> +static bool actions_may_change_flow(const struct nlattr *actions)
> +{
> +       struct nlattr *nla;
> +       int rem;
> +
> +       nla_for_each_nested(nla, actions, rem) {
> +               u16 action = nla_type(nla);
> +
> +               switch (action) {
> +               case OVS_ACTION_ATTR_OUTPUT:
> +               case OVS_ACTION_ATTR_RECIRC:
> +               case OVS_ACTION_ATTR_USERSPACE:
> +               case OVS_ACTION_ATTR_SAMPLE:
> +               case OVS_ACTION_ATTR_TRUNC:

Trunc doesn't change the flow, but if there's something like
sample(output,trunc),output() then I wouldn't expect the final output
to be truncated.

<snip>

> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>         return err;
>  }
>
> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
> +static int sample_action_to_attr(const struct nlattr *attr,
> +                                struct sk_buff *skb)
>  {
> -       const struct nlattr *a;
> -       struct nlattr *start;
> -       int err = 0, rem;
> +       struct nlattr *start, *ac_start, *sample_arg;
> +       int err = 0, rem = nla_len(attr);
> +       const struct sample_arg *arg;
> +       struct nlattr *actions;
>
>         start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>         if (!start)
>                 return -EMSGSIZE;
>
> -       nla_for_each_nested(a, attr, rem) {
> -               int type = nla_type(a);
> -               struct nlattr *st_sample;
> +       sample_arg = nla_data(attr);
> +       arg = nla_data(sample_arg);
> +       actions = nla_next(sample_arg, &rem);
>
> -               switch (type) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
> -                                   sizeof(u32), nla_data(a)))
> -                               return -EMSGSIZE;
> -                       break;
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> -                       if (!st_sample)
> -                               return -EMSGSIZE;
> -                       err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
> -                       if (err)
> -                               return err;
> -                       nla_nest_end(skb, st_sample);
> -                       break;
> -               }
> -       }
> +       if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
> +               return -EMSGSIZE;
>
> +       ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> +       if (!ac_start)
> +               return -EMSGSIZE;
> +
> +       err = ovs_nla_put_actions(actions, rem, skb);
> +       if (err)
> +               return err;

Shouldn't we nla_nest_cancel() when something fails in the middle of
nesting nla attributes? For example I believe that upcall will attempt
to serialize actions, but if actions don't fit in the buffer then
it'll just skip the actions and forward everything else.

> +
> +       nla_nest_end(skb, ac_start);
>         nla_nest_end(skb, start);
> +
>         return err;
>  }
>
> --
> 1.8.3.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ