lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a985705-11e5-1575-a049-723accb97608@mellanox.com>
Date:   Tue, 20 Dec 2016 08:22:41 +0200
From:   Shahar Klein <shahark@...lanox.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
CC:     <shahark@...lanox.com>, Or Gerlitz <gerlitz.or@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Linux Netdev List <netdev@...r.kernel.org>,
        Roi Dayan <roid@...lanox.com>,
        David Miller <davem@...emloft.net>,
        Jiri Pirko <jiri@...lanox.com>,
        John Fastabend <john.fastabend@...il.com>,
        "Hadar Hen Zion" <hadarh@...lanox.com>
Subject: Re: Soft lockup in tc_classify



On 12/19/2016 7:58 PM, Cong Wang wrote:
> Hello,
>
> On Mon, Dec 19, 2016 at 8:39 AM, Shahar Klein <shahark@...lanox.com> wrote:
>>
>>
>> On 12/13/2016 12:51 AM, Cong Wang wrote:
>>>
>>> On Mon, Dec 12, 2016 at 1:18 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>>>
>>>> On Mon, Dec 12, 2016 at 3:28 PM, Daniel Borkmann <daniel@...earbox.net>
>>>> wrote:
>>>>
>>>>> Note that there's still the RCU fix missing for the deletion race that
>>>>> Cong will still send out, but you say that the only thing you do is to
>>>>> add a single rule, but no other operation in involved during that test?
>>>>
>>>>
>>>> What's missing to have the deletion race fixed? making a patch or
>>>> testing to a patch which was sent?
>>>
>>>
>>> If you think it would help for this problem, here is my patch rebased
>>> on the latest net-next.
>>>
>>> Again, I don't see how it could help this case yet, especially I don't
>>> see how we could have a loop in this singly linked list.
>>>
>>
>> I've applied cong's patch and hit a different lockup(full log attached):
>
>
> Are you sure this is really different? For me, it is still inside the loop
> in tc_classify(), with only a slightly different offset.
>
>
>>
>> Daniel suggested I'll add a print:
>>                 case RTM_DELTFILTER:
>> -                   err = tp->ops->delete(tp, fh);
>> +                 printk(KERN_ERR "DEBUGG:SK %s:%d\n", __func__, __LINE__);
>> +                 err = tp->ops->delete(tp, fh, &last);
>>                         if (err == 0) {
>>
>> and I couldn't see this print in the output.....
>
> Hmm, that is odd, if this never prints, then my patch should not make any
> difference.
>
> There are still two other cases where we could change tp->next, so do you
> mind to add two more printk's for debugging?
>
> Attached is the delta patch.
>
> Thanks!
>

I've added a slightly different debug print:
@@ -368,11 +375,12 @@ static int tc_ctl_tfilter(struct sk_buff *skb, 
struct nlmsghdr *n)
                 if (tp_created) {
                         RCU_INIT_POINTER(tp->next, 
rtnl_dereference(*back));
                         rcu_assign_pointer(*back, tp);
+                 printk(KERN_ERR "DEBUGG:SK add/change filter by: %pf 
tp=%p tp->next=%p\n", tp->ops->get, tp, tp->next);
                 }
                 tfilter_notify(net, skb, n, tp, fh, RTM_NEWTFILTER, false);

full output attached:

[  283.290271] Mirror/redirect action on
[  283.305031] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9432d704df60 tp->next=          (null)
[  283.322563] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d240 tp->next=          (null)
[  283.359997] GACT probability on
[  283.365923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d240
[  283.378725] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.391310] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.403923] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  283.416542] DEBUGG:SK add/change filter by: fl_get [cls_flower] 
tp=ffff9436e718d3c0 tp->next=ffff9436e718d3c0
[  308.538571] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! 
[swapper/0:0]


Thanks
Shahar




View attachment "tp_p_debug.log" of type "text/plain" (18147 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ