netdev - Re: Soft lockup in tc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <585A811D.1080202@iogearbox.net>
Date:   Wed, 21 Dec 2016 14:18:21 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Shahar Klein <shahark@...lanox.com>,
        Cong Wang <xiyou.wangcong@...il.com>
CC:     Or Gerlitz <gerlitz.or@...il.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Roi Dayan <roid@...lanox.com>,
        David Miller <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        John Fastabend <john.fastabend@...il.com>,
        Hadar Hen Zion <hadarh@...lanox.com>
Subject: Re: Soft lockup in tc_classify

On 12/21/2016 01:58 PM, Shahar Klein wrote:
> On 12/21/2016 12:15 PM, Daniel Borkmann wrote:
>> On 12/21/2016 08:03 AM, Cong Wang wrote:
>>> On Tue, Dec 20, 2016 at 10:44 PM, Shahar Klein <shahark@...lanox.com>
>>> wrote:
>> [...]
>>> Looks like you added a debug printk inside tcf_destroy() too,
>>> which seems racy with filter creation, it should not happen since
>>> in both cases we take RTNL lock.
>>>
>>> Don't know if changing all RCU_INIT_POINTER in that file to
>>> rcu_assign_pointer could help anything or not. Mind to try?
>>
>> I don't think at this point that it's RCU related at all.
>>
>> I have a theory on what is happening. Quoting the piece in question from
>> Shahar's log:
>>
>>  1: thread-2845[cpu-1] setting tp_created to 1 tp=ffff94b5b0280780
>> back=ffff94b9ea932060
>>  2: thread-2856[cpu-1] setting tp_created to 1 tp=ffff94b9ea9322a0
>> back=ffff94b9ea932060
>>  3: thread-2843[cpu-1] setting tp_created to 1 tp=ffff94b5b402c960
>> back=ffff94b9ea932060
>>  4: destroy ffff94b5b669fea0 tcf_destroy:1905
>>  5: thread-2853[cpu-1] setting tp_created to 1 tp=ffff94b5b02805a0
>> back=ffff94b9ea932060
>>  6: thread-2853[cpu-1] add/change filter by: fl_get [cls_flower]
>> tp=ffff94b5b02805a0 tp->next=ffff94b9ea932060
>>  7: destroy ffff94b5b0280780 tcf_destroy:1905
>>  8: thread-2845[cpu-1] add/change filter by: fl_get [cls_flower]
>> tp=ffff94b5b02805a0 tp->next=ffff94b5b02805a0
>>
>> The interesting thing is that all this happens on CPU1, so as you say
>> we're under rtnl.
>> In 1), thread-2845 creates tp=ffff94b5b0280780, which is destroyed in
>> 7), presumably also
>> by thread-2845, and the weird part is why suddenly in 8) thread-2845
>> adds a created filter
>> without actually creating it. Plus, thread-2845 got interrupted, which
>> means it must have
>> dropped rntl in the middle. We drop it in tc_ctl_tfilter() when we do
>> tcf_proto_lookup_ops()
>> and need to pull in a module, but here this doesn't make sense at all
>> since i) at this
>> point we haven't created the tp yet and 2) flower was already there.
>> Thus the only explanation
>> where this must have happened is where we called tp->ops->change(). So
>> here the return
>> code must have been -EAGAIN, which makes sense because in 7) we
>> destroyed that specific
>> tp instance. Which means we goto replay but *do not* clear tp_created. I
>> think that is
>> the bug in question. So, while we dropped rtnl in the meantime, some
>> other tp instance
>> was added (tp=ffff94b5b02805a0) that we had a match on in round 2, but
>> we still think it
>> was newly created which wasn't the actual case. So we'd need to deal
>> with the fact that
>> ->change() callback could return -EAGAIN as well. Now looking at flower,
>> I think the call
>> chain must have been fl_change() -> fl_set_parms() ->
>> tcf_exts_validate() -> tcf_action_init()
>> -> tcf_action_init_1(). And here one possibility I see is that
>> tc_lookup_action_n()
>> failed, therefore we shortly dropped rtnl for the request_module() where
>> the module
>> got loaded successfully and thus error code from there is -EAGAIN that
>> got propagated
>> all the way through ->change() from tc_ctl_tfilter(). So it looks like a
>> generic issue
>> not specifically tied to flower.
>>
>> Shahar, can you test the following? Thanks!
>>
>>  net/sched/cls_api.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 3fbba79..1ecdf80 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb,
>> struct nlmsghdr *n)
>>      unsigned long cl;
>>      unsigned long fh;
>>      int err;
>> -    int tp_created = 0;
>> +    int tp_created;
>>
>>      if ((n->nlmsg_type != RTM_GETTFILTER) &&
>>          !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
>>          return -EPERM;
>>
>>  replay:
>> +    tp_created = 0;
>> +
>>      err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL);
>>      if (err < 0)
>>          return err;
>
> Test run successfully!
> I'll remove all other debug "fixes" and run again later
>
> Thanks Daniel!

Great, thanks for confirming so far Shahar!
I'll cook an official patch in the meantime.