[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb55ede9-3e6e-8e68-3005-8bdb2105122e@mellanox.com>
Date: Wed, 21 Dec 2016 14:58:17 +0200
From: Shahar Klein <shahark@...lanox.com>
To: Daniel Borkmann <daniel@...earbox.net>,
Cong Wang <xiyou.wangcong@...il.com>
CC: <shahark@...lanox.com>, Or Gerlitz <gerlitz.or@...il.com>,
"Linux Netdev List" <netdev@...r.kernel.org>,
Roi Dayan <roid@...lanox.com>,
David Miller <davem@...emloft.net>,
Jiri Pirko <jiri@...lanox.com>,
John Fastabend <john.fastabend@...il.com>,
Hadar Hen Zion <hadarh@...lanox.com>
Subject: Re: Soft lockup in tc_classify
On 12/21/2016 12:15 PM, Daniel Borkmann wrote:
> On 12/21/2016 08:03 AM, Cong Wang wrote:
>> On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein <shahark@...lanox.com>
>> wrote:
> [...]
>> Looks like you added a debug printk inside tcf_destroy() too,
>> which seems racy with filter creation, it should not happen since
>> in both cases we take RTNL lock.
>>
>> Don't know if changing all RCU_INIT_POINTER in that file to
>> rcu_assign_pointer could help anything or not. Mind to try?
>
> I don't think at this point that it's RCU related at all.
>
> I have a theory on what is happening. Quoting the piece in question from
> Shahar's log:
>
> 1: thread-2845[cpu-1] setting tp_created to 1 tp=ffff94b5b0280780
> back=ffff94b9ea932060
> 2: thread-2856[cpu-1] setting tp_created to 1 tp=ffff94b9ea9322a0
> back=ffff94b9ea932060
> 3: thread-2843[cpu-1] setting tp_created to 1 tp=ffff94b5b402c960
> back=ffff94b9ea932060
> 4: destroy ffff94b5b669fea0 tcf_destroy:1905
> 5: thread-2853[cpu-1] setting tp_created to 1 tp=ffff94b5b02805a0
> back=ffff94b9ea932060
> 6: thread-2853[cpu-1] add/change filter by: fl_get [cls_flower]
> tp=ffff94b5b02805a0 tp->next=ffff94b9ea932060
> 7: destroy ffff94b5b0280780 tcf_destroy:1905
> 8: thread-2845[cpu-1] add/change filter by: fl_get [cls_flower]
> tp=ffff94b5b02805a0 tp->next=ffff94b5b02805a0
>
> The interesting thing is that all this happens on CPU1, so as you say
> we're under rtnl.
> In 1), thread-2845 creates tp=ffff94b5b0280780, which is destroyed in
> 7), presumably also
> by thread-2845, and the weird part is why suddenly in 8) thread-2845
> adds a created filter
> without actually creating it. Plus, thread-2845 got interrupted, which
> means it must have
> dropped rntl in the middle. We drop it in tc_ctl_tfilter() when we do
> tcf_proto_lookup_ops()
> and need to pull in a module, but here this doesn't make sense at all
> since i) at this
> point we haven't created the tp yet and 2) flower was already there.
> Thus the only explanation
> where this must have happened is where we called tp->ops->change(). So
> here the return
> code must have been -EAGAIN, which makes sense because in 7) we
> destroyed that specific
> tp instance. Which means we goto replay but *do not* clear tp_created. I
> think that is
> the bug in question. So, while we dropped rtnl in the meantime, some
> other tp instance
> was added (tp=ffff94b5b02805a0) that we had a match on in round 2, but
> we still think it
> was newly created which wasn't the actual case. So we'd need to deal
> with the fact that
> ->change() callback could return -EAGAIN as well. Now looking at flower,
> I think the call
> chain must have been fl_change() -> fl_set_parms() ->
> tcf_exts_validate() -> tcf_action_init()
> -> tcf_action_init_1(). And here one possibility I see is that
> tc_lookup_action_n()
> failed, therefore we shortly dropped rtnl for the request_module() where
> the module
> got loaded successfully and thus error code from there is -EAGAIN that
> got propagated
> all the way through ->change() from tc_ctl_tfilter(). So it looks like a
> generic issue
> not specifically tied to flower.
>
> Shahar, can you test the following? Thanks!
>
> net/sched/cls_api.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 3fbba79..1ecdf80 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb,
> struct nlmsghdr *n)
> unsigned long cl;
> unsigned long fh;
> int err;
> - int tp_created = 0;
> + int tp_created;
>
> if ((n->nlmsg_type != RTM_GETTFILTER) &&
> !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
> return -EPERM;
>
> replay:
> + tp_created = 0;
> +
> err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
> if (err < 0)
> return err;
Test run successfully!
I'll remove all other debug "fixes" and run again later
Thanks Daniel!
Powered by blists - more mailing lists