[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABKoBm3fQqdVugW9n7Bkj4t21AESgV9=k2yMpZCikVxAXmffMg@mail.gmail.com>
Date: Fri, 10 Mar 2017 14:07:09 -0800
From: Andy Zhou <azhou@....org>
To: Joe Stringer <joe@....org>
Cc: netdev <netdev@...r.kernel.org>, pravin shelar <pshelar@....org>
Subject: Re: [RFC net-next sample action optimization 3/3] openvswitch:
Optimize sample action for the clone use cases
On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer <joe@....org> wrote:
> On 7 March 2017 at 16:15, Andy Zhou <azhou@....org> wrote:
>> With the introduction of open flow 'clone' action, the OVS user space
>> can now translate the 'clone' action into kernel datapath 'sample'
>> action, with 100% probability, to ensure that the clone semantics,
>> which is that the packet seen by the clone action is the same as the
>> packet seen by the action after clone, is faithfully carried out
>> in the datapath.
>>
>> While the sample action in the datpath has the matching semantics,
>> its implementation is only optimized for its original use.
>> Specifically, there are two limitation: First, there is a 3 level of
>> nesting restriction, enforced at the flow downloading time. This
>> limit turns out to be too restrictive for the 'clone' use case.
>> Second, the implementation avoid recursive call only if the sample
>> action list has a single userspace action.
>>
>> The optimization implemented in this series removes the static
>> nesting limit check, instead, implement the run time recursion limit
>> check, and recursion avoidance similar to that of the 'recirc' action.
>> This optimization solve both #1 and #2 issues above.
>>
>> Another optimization implemented is to avoid coping flow key as
>
> *copying
>
>> long as the actions enclosed does not change the flow key. The
>> detection is performed only once at the flow downloading time.
>>
>> The third optimization implemented is to rewrite the action list
>> at flow downloading time in order to save the fast path from parsing
>> the sample action list in its original form repeatedly.
>
> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
> implemented) in a commit message, I have to ask whether each "another
> change..." should be a separate patch. It certainly makes it easier to
> review.
>
They are all part of the same implementation. Spliting them probably won't
make much sense. I think I will drop #2 and #3 in the commit message since
#1 is the main optimization.
> I ran this through the OVS kernel tests and it's working correctly
> from that point of view, but I didn't get a chance to dig in and
> ensure for example, runtime behaviour of several nested
> sample(actions(sample(actions(sample(actions(output)))))) handles
> reasonably when it runs out of stack and deferred actions space. At a
> high level though, this seems pretty straightforward.
>
> Several comments below, thanks.
>
>>
>> Signed-off-by: Andy Zhou <azhou@....org>
>> ---
>> net/openvswitch/actions.c | 106 ++++++++++++++++++------------------
>> net/openvswitch/datapath.h | 7 +++
>> net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>> 3 files changed, 140 insertions(+), 91 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 259aea9..2e8c372 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>> }
>>
>> static int sample(struct datapath *dp, struct sk_buff *skb,
>> - struct sw_flow_key *key, const struct nlattr *attr,
>> - const struct nlattr *actions, int actions_len)
>> + struct sw_flow_key *key, const struct nlattr *attr)
>> {
>> - const struct nlattr *acts_list = NULL;
>> - const struct nlattr *a;
>> - int rem;
>> - u32 cutlen = 0;
>> -
>> - for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>> - a = nla_next(a, &rem)) {
>> - u32 probability;
>> -
>> - switch (nla_type(a)) {
>> - case OVS_SAMPLE_ATTR_PROBABILITY:
>> - probability = nla_get_u32(a);
>> - if (!probability || prandom_u32() > probability)
>> - return 0;
>> - break;
>> -
>> - case OVS_SAMPLE_ATTR_ACTIONS:
>> - acts_list = a;
>> - break;
>> - }
>> - }
>> + struct nlattr *actions;
>> + struct nlattr *sample_arg;
>> + struct sw_flow_key *orig = key;
>> + int rem = nla_len(attr);
>> + int err = 0;
>> + const struct sample_arg *arg;
>>
>> - rem = nla_len(acts_list);
>> - a = nla_data(acts_list);
>> + /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>
> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>
>> + sample_arg = nla_data(attr);
>
> We could do this in the parent call, like several other actions do.
What do you mean?
>
> <snip>
>
>> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>> err = execute_masked_set_action(skb, key, nla_data(a));
>> break;
>>
>> - case OVS_ACTION_ATTR_SAMPLE:
>> - err = sample(dp, skb, key, a, attr, len);
>> + case OVS_ACTION_ATTR_SAMPLE: {
>> + bool last = nla_is_last(a, rem);
>> + struct sk_buff *clone_skb;
>> +
>> + clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
>> +
>> + if (!clone_skb)
>> + /* Out of memory, skip this sample action.
>> + */
>
> Should we ratelimited complain in this case?
The DP code has never logged against skb_clone() failures.
>
>> + break;
>> +
>> + err = sample(dp, clone_skb, key, a);
>> + if (last)
>> + return err;
>
> Given that sample() doesn't guarantee it will free 'clone_skb', I
> don't think this is safe.
Good point, I will move to clone into the sample() function, and only clone
after the probability test passes.
>
>> +
>> + pr_err("executing sample");
>
> Useful for debugging, but don't forget to drop for next version.
Sure
>
>> break;
>> + }
>
> This bracket is typically aligned with the character of the beginning
> of the 'case' that it applies to.
>
>>
>> case OVS_ACTION_ATTR_CT:
>> if (!is_flow_key_valid(key)) {
>> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
>> index 1c6e937..d7a6e803 100644
>> --- a/net/openvswitch/datapath.h
>> +++ b/net/openvswitch/datapath.h
>> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>> if (logging_allowed && net_ratelimit()) \
>> pr_info("netlink: " fmt "\n", ##__VA_ARGS__); \
>> } while (0)
>> +
>> +#define OVS_SAMPLE_ATTR_ARG (OVS_ACTION_ATTR_MAX + 1)
>
> This should be defined in include/uapi/linux/openvswitch.h, protected
> by #ifdef __KERNEL__ like the other internal formats. While I don't
> think it actually conflicts with any other internal action format,
> these should be in one place so we don't end up accidentally using the
> same enum twice.
>
O.K.
>> +struct sample_arg {
>> + bool exec;
>> + u32 probability;
>> +};
>
> Perhaps we should move this to either net/openvswitch/flow_netlink.h
> or include/linux/openvswitch.h since it's only for consumption
> internally by the kernel module?
O.K. I will move it into openvswitch.h
>
>> +
>> #endif /* datapath.h */
>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> index 6f5fa50..0e7cf13 100644
>> --- a/net/openvswitch/flow_netlink.c
>> +++ b/net/openvswitch/flow_netlink.c
>> @@ -1,5 +1,5 @@
>> /*
>> - * Copyright (c) 2007-2014 Nicira, Inc.
>> + * Copyright (c) 2007-2017 Nicira, Inc.
>> *
>> * This program is free software; you can redistribute it and/or
>> * modify it under the terms of version 2 of the GNU General Public
>> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>> #define OVS_ATTR_NESTED -1
>> #define OVS_ATTR_VARIABLE -2
>>
>> +static bool actions_may_change_flow(const struct nlattr *actions)
>> +{
>> + struct nlattr *nla;
>> + int rem;
>> +
>> + nla_for_each_nested(nla, actions, rem) {
>> + u16 action = nla_type(nla);
>> +
>> + switch (action) {
>> + case OVS_ACTION_ATTR_OUTPUT:
>> + case OVS_ACTION_ATTR_RECIRC:
>> + case OVS_ACTION_ATTR_USERSPACE:
>> + case OVS_ACTION_ATTR_SAMPLE:
>> + case OVS_ACTION_ATTR_TRUNC:
>
> Trunc doesn't change the flow, but if there's something like
> sample(output,trunc),output() then I wouldn't expect the final output
> to be truncated.
It should not, since skb is not seen by the 2nd output().
>
> <snip>
>
>> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>> return err;
>> }
>>
>> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
>> +static int sample_action_to_attr(const struct nlattr *attr,
>> + struct sk_buff *skb)
>> {
>> - const struct nlattr *a;
>> - struct nlattr *start;
>> - int err = 0, rem;
>> + struct nlattr *start, *ac_start, *sample_arg;
>> + int err = 0, rem = nla_len(attr);
>> + const struct sample_arg *arg;
>> + struct nlattr *actions;
>>
>> start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>> if (!start)
>> return -EMSGSIZE;
>>
>> - nla_for_each_nested(a, attr, rem) {
>> - int type = nla_type(a);
>> - struct nlattr *st_sample;
>> + sample_arg = nla_data(attr);
>> + arg = nla_data(sample_arg);
>> + actions = nla_next(sample_arg, &rem);
>>
>> - switch (type) {
>> - case OVS_SAMPLE_ATTR_PROBABILITY:
>> - if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
>> - sizeof(u32), nla_data(a)))
>> - return -EMSGSIZE;
>> - break;
>> - case OVS_SAMPLE_ATTR_ACTIONS:
>> - st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> - if (!st_sample)
>> - return -EMSGSIZE;
>> - err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
>> - if (err)
>> - return err;
>> - nla_nest_end(skb, st_sample);
>> - break;
>> - }
>> - }
>> + if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
>> + return -EMSGSIZE;
>>
>> + ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> + if (!ac_start)
>> + return -EMSGSIZE;
>> +
>> + err = ovs_nla_put_actions(actions, rem, skb);
>> + if (err)
>> + return err;
>
> Shouldn't we nla_nest_cancel() when something fails in the middle of
> nesting nla attributes? For example I believe that upcall will attempt
> to serialize actions, but if actions don't fit in the buffer then
> it'll just skip the actions and forward everything else.
Good catch. Will fix.
>
>> +
>> + nla_nest_end(skb, ac_start);
>> nla_nest_end(skb, start);
>> +
>> return err;
>> }
>>
>> --
>> 1.8.3.1
>>
Powered by blists - more mailing lists