lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABKoBm3fQqdVugW9n7Bkj4t21AESgV9=k2yMpZCikVxAXmffMg@mail.gmail.com>
Date:   Fri, 10 Mar 2017 14:07:09 -0800
From:   Andy Zhou <azhou@....org>
To:     Joe Stringer <joe@....org>
Cc:     netdev <netdev@...r.kernel.org>, pravin shelar <pshelar@....org>
Subject: Re: [RFC net-next sample action optimization 3/3] openvswitch:
 Optimize sample action for the clone use cases

On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer <joe@....org> wrote:
> On 7 March 2017 at 16:15, Andy Zhou <azhou@....org> wrote:
>> With the introduction of open flow 'clone' action, the OVS user space
>> can now translate the 'clone' action into kernel datapath 'sample'
>> action, with 100% probability, to ensure that the clone semantics,
>> which is that the packet seen by the clone action is the same as the
>> packet seen by the action after clone, is faithfully carried out
>> in the datapath.
>>
>> While the sample action in the datpath has the matching semantics,
>> its implementation is only optimized for its original use.
>> Specifically, there are two limitation: First, there is a 3 level of
>> nesting restriction, enforced at the flow downloading time. This
>> limit turns out to be too restrictive for the 'clone' use case.
>> Second, the implementation avoid recursive call only if the sample
>> action list has a single userspace action.
>>
>> The optimization implemented in this series removes the static
>> nesting limit check, instead, implement the run time recursion limit
>> check, and recursion avoidance similar to that of the 'recirc' action.
>> This optimization solve both #1 and #2 issues above.
>>
>> Another optimization implemented is to avoid coping flow key as
>
> *copying
>
>> long as the actions enclosed does not change the flow key. The
>> detection is performed only once at the flow downloading time.
>>
>> The third optimization implemented is to rewrite the action list
>> at flow downloading time in order to save the fast path from parsing
>> the sample action list in its original form repeatedly.
>
> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
> implemented) in a commit message, I have to ask whether each "another
> change..." should be a separate patch. It certainly makes it easier to
> review.
>
They are all part of the same implementation. Spliting them probably won't
make much sense. I think I will drop #2 and #3 in the commit message since
#1 is the main optimization.

> I ran this through the OVS kernel tests and it's working correctly
> from that point of view, but I didn't get a chance to dig in and
> ensure for example, runtime behaviour of several nested
> sample(actions(sample(actions(sample(actions(output)))))) handles
> reasonably when it runs out of stack and deferred actions space. At a
> high level though, this seems pretty straightforward.
>
> Several comments below, thanks.
>
>>
>> Signed-off-by: Andy Zhou <azhou@....org>
>> ---
>>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>>  net/openvswitch/datapath.h     |   7 +++
>>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>>  3 files changed, 140 insertions(+), 91 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 259aea9..2e8c372 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>>  }
>>
>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>> -                 struct sw_flow_key *key, const struct nlattr *attr,
>> -                 const struct nlattr *actions, int actions_len)
>> +                 struct sw_flow_key *key, const struct nlattr *attr)
>>  {
>> -       const struct nlattr *acts_list = NULL;
>> -       const struct nlattr *a;
>> -       int rem;
>> -       u32 cutlen = 0;
>> -
>> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>> -                a = nla_next(a, &rem)) {
>> -               u32 probability;
>> -
>> -               switch (nla_type(a)) {
>> -               case OVS_SAMPLE_ATTR_PROBABILITY:
>> -                       probability = nla_get_u32(a);
>> -                       if (!probability || prandom_u32() > probability)
>> -                               return 0;
>> -                       break;
>> -
>> -               case OVS_SAMPLE_ATTR_ACTIONS:
>> -                       acts_list = a;
>> -                       break;
>> -               }
>> -       }
>> +       struct nlattr *actions;
>> +       struct nlattr *sample_arg;
>> +       struct sw_flow_key *orig = key;
>> +       int rem = nla_len(attr);
>> +       int err = 0;
>> +       const struct sample_arg *arg;
>>
>> -       rem = nla_len(acts_list);
>> -       a = nla_data(acts_list);
>> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>
> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>
>> +       sample_arg = nla_data(attr);
>
> We could do this in the parent call, like several other actions do.

What do you mean?

>
> <snip>
>
>> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>                         err = execute_masked_set_action(skb, key, nla_data(a));
>>                         break;
>>
>> -               case OVS_ACTION_ATTR_SAMPLE:
>> -                       err = sample(dp, skb, key, a, attr, len);
>> +               case OVS_ACTION_ATTR_SAMPLE: {
>> +                       bool last = nla_is_last(a, rem);
>> +                       struct sk_buff *clone_skb;
>> +
>> +                       clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
>> +
>> +                       if (!clone_skb)
>> +                               /* Out of memory, skip this sample action.
>> +                                */
>
> Should we ratelimited complain in this case?

The DP code has never logged against skb_clone() failures.

>
>> +                               break;
>> +
>> +                       err = sample(dp, clone_skb, key, a);
>> +                       if (last)
>> +                               return err;
>
> Given that sample() doesn't guarantee it will free 'clone_skb', I
> don't think this is safe.

Good point, I will move to clone into the sample() function, and only clone
after the probability test passes.

>
>> +
>> +                       pr_err("executing sample");
>
> Useful for debugging, but don't forget to drop for next version.
Sure
>
>>                         break;
>> +                       }
>
> This bracket is typically aligned with the character of the beginning
> of the 'case' that it applies to.
>
>>
>>                 case OVS_ACTION_ATTR_CT:
>>                         if (!is_flow_key_valid(key)) {
>> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
>> index 1c6e937..d7a6e803 100644
>> --- a/net/openvswitch/datapath.h
>> +++ b/net/openvswitch/datapath.h
>> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>         if (logging_allowed && net_ratelimit())                 \
>>                 pr_info("netlink: " fmt "\n", ##__VA_ARGS__);   \
>>  } while (0)
>> +
>> +#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)
>
> This should be defined in include/uapi/linux/openvswitch.h, protected
> by #ifdef __KERNEL__ like the other internal formats. While I don't
> think it actually conflicts with any other internal action format,
> these should be in one place so we don't end up accidentally using the
> same enum twice.
>
O.K.

>> +struct sample_arg {
>> +       bool exec;
>> +       u32  probability;
>> +};
>
> Perhaps we should move this to either net/openvswitch/flow_netlink.h
> or include/linux/openvswitch.h since it's only for consumption
> internally by the kernel module?

O.K.  I will move it into openvswitch.h
>
>> +
>>  #endif /* datapath.h */
>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> index 6f5fa50..0e7cf13 100644
>> --- a/net/openvswitch/flow_netlink.c
>> +++ b/net/openvswitch/flow_netlink.c
>> @@ -1,5 +1,5 @@
>>  /*
>> - * Copyright (c) 2007-2014 Nicira, Inc.
>> + * Copyright (c) 2007-2017 Nicira, Inc.
>>   *
>>   * This program is free software; you can redistribute it and/or
>>   * modify it under the terms of version 2 of the GNU General Public
>> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>>  #define OVS_ATTR_NESTED -1
>>  #define OVS_ATTR_VARIABLE -2
>>
>> +static bool actions_may_change_flow(const struct nlattr *actions)
>> +{
>> +       struct nlattr *nla;
>> +       int rem;
>> +
>> +       nla_for_each_nested(nla, actions, rem) {
>> +               u16 action = nla_type(nla);
>> +
>> +               switch (action) {
>> +               case OVS_ACTION_ATTR_OUTPUT:
>> +               case OVS_ACTION_ATTR_RECIRC:
>> +               case OVS_ACTION_ATTR_USERSPACE:
>> +               case OVS_ACTION_ATTR_SAMPLE:
>> +               case OVS_ACTION_ATTR_TRUNC:
>
> Trunc doesn't change the flow, but if there's something like
> sample(output,trunc),output() then I wouldn't expect the final output
> to be truncated.

It should not, since skb is not seen by the 2nd output().
>
> <snip>
>
>> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>         return err;
>>  }
>>
>> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
>> +static int sample_action_to_attr(const struct nlattr *attr,
>> +                                struct sk_buff *skb)
>>  {
>> -       const struct nlattr *a;
>> -       struct nlattr *start;
>> -       int err = 0, rem;
>> +       struct nlattr *start, *ac_start, *sample_arg;
>> +       int err = 0, rem = nla_len(attr);
>> +       const struct sample_arg *arg;
>> +       struct nlattr *actions;
>>
>>         start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>>         if (!start)
>>                 return -EMSGSIZE;
>>
>> -       nla_for_each_nested(a, attr, rem) {
>> -               int type = nla_type(a);
>> -               struct nlattr *st_sample;
>> +       sample_arg = nla_data(attr);
>> +       arg = nla_data(sample_arg);
>> +       actions = nla_next(sample_arg, &rem);
>>
>> -               switch (type) {
>> -               case OVS_SAMPLE_ATTR_PROBABILITY:
>> -                       if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
>> -                                   sizeof(u32), nla_data(a)))
>> -                               return -EMSGSIZE;
>> -                       break;
>> -               case OVS_SAMPLE_ATTR_ACTIONS:
>> -                       st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> -                       if (!st_sample)
>> -                               return -EMSGSIZE;
>> -                       err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
>> -                       if (err)
>> -                               return err;
>> -                       nla_nest_end(skb, st_sample);
>> -                       break;
>> -               }
>> -       }
>> +       if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
>> +               return -EMSGSIZE;
>>
>> +       ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> +       if (!ac_start)
>> +               return -EMSGSIZE;
>> +
>> +       err = ovs_nla_put_actions(actions, rem, skb);
>> +       if (err)
>> +               return err;
>
> Shouldn't we nla_nest_cancel() when something fails in the middle of
> nesting nla attributes? For example I believe that upcall will attempt
> to serialize actions, but if actions don't fit in the buffer then
> it'll just skip the actions and forward everything else.

Good catch.  Will fix.
>
>> +
>> +       nla_nest_end(skb, ac_start);
>>         nla_nest_end(skb, start);
>> +
>>         return err;
>>  }
>>
>> --
>> 1.8.3.1
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ