netdev - Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <583A6567.30003@mellanox.com>
Date:   Sun, 27 Nov 2016 06:47:35 +0200
From:   Roi Dayan <roid@...lanox.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Cong Wang <xiyou.wangcong@...il.com>
CC:     <roid@...lanox.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jiri Pirko <jiri@...lanox.com>,
        John Fastabend <john.fastabend@...il.com>
Subject: Re: [Patch net-next] net_sched: move the empty tp check from
 ->destroy() to ->delete()



On 27/11/2016 02:33, Daniel Borkmann wrote:
> On 11/26/2016 12:09 PM, Daniel Borkmann wrote:
>> On 11/26/2016 07:46 AM, Cong Wang wrote:
>>> On Thu, Nov 24, 2016 at 7:20 AM, Daniel Borkmann 
>>> <daniel@...earbox.net> wrote:
> [...]
>>>> Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
>>>> drops its entire chain via tcf_destroy_chain(), so that will be NULL
>>>> eventually. The tps are freed by call_rcu() as well as qdisc itself
>>>> later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
>>>> Outstanding readers should either bail out due to if (!cl) or can 
>>>> still
>>>> process the chain until read section ends, but during that time, cl->q
>>>> resp. bstats should be good. Do you happen to know what's at address
>>>> ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
>>>> at least on ingress (netif_receive_skb_internal()) we hold 
>>>> rcu_read_lock()
>>>> here. The KASAN report is reliably happening at this location, right?
>>>
>>> I am confused as well, I don't see how it could be related to my 
>>> patch yet.
>>> I will take a deep look in the weekend.



Hi Cong,

When reported the new trace I didn't mean it's related to your patch, I 
just wanted to point it out it exposed something. I should have been 
clear about it.


>>
>> Ok, I'm currently on the run. Got too late yesterday night, but I'll
>> write what I found in the evening today, not related to ingress though.
>
> Just pushed out my analysis to netdev under "[PATCH net] net, sched: 
> respect
> rcu grace period on cls destruction". My conclusion is that both 
> issues are
> actually separate, and that one is small enough where we could route 
> it via
> net actually. Perhaps this at the same time shrinks your "[PATCH 
> net-next]
> net_sched: move the empty tp check from ->destroy() to ->delete()" to a
> reasonable size that it's suitable to net as well. Your 
> ->delete()/->destroy()
> one is definitely needed, too. The tp->root one is independant of 
> ->delete()/
> ->destroy() as they are different races and tp->root could also happen 
> when
> you just destroy the whole tp directly. I think that seems like a good 
> path
> forward to me.
>
> Thanks,
> Daniel



Hi Daniel,

As for the tainted kernel. I was in old (week or two) net-next tree and 
only cherry-picked from latest net-next related patches to Mellanox HCA, 
cls_api, cls_flower, devlink. so those are the tainted modules.
I have the issue reproducing in that tree so wanted it to check it with 
Cong's patch instead of latest net-next.
I'll try running reproducing the issue with your new patch and later try 
latest net-next as well.

Thanks,
Roi