netdev - Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 22 Nov 2016 21:24:51 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Jiri Pirko <jiri@...nulli.us>, Roi Dayan <roid@...lanox.com>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Jiri Pirko <jiri@...lanox.com>,
        Or Gerlitz <ogerlitz@...lanox.com>,
        Cong Wang <cwang@...pensource.com>
Subject: Re: [PATCH net-next] net/sched: cls_flower: verify root pointer
 before dereferncing it

On Tue, Nov 22, 2016 at 3:36 PM, John Fastabend
<john.fastabend@...il.com> wrote:
> On 16-11-22 12:41 PM, Daniel Borkmann wrote:
>> On 11/22/2016 08:28 PM, Cong Wang wrote:
>>> On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>> Tue, Nov 22, 2016 at 05:04:11PM CET, daniel@...earbox.net wrote:
>>>>> Hmm, I don't think we want to have such an additional test in fast
>>>>> path for each and every classifier. Can we think of ways to avoid that?
>>>>>
>>>>> My question is, since we unlink individual instances from such
>>>>> tp-internal
>>>>> lists through RCU and release the instance through call_rcu() as
>>>>> well as
>>>>> the head (tp->root) via kfree_rcu() eventually, against what are we
>>>>> protecting
>>>>> setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback?
>>>>> Something
>>>>> not respecting grace period?
>>>>
>>>> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
>>>> to null.
>>
>> But that's not really an answer to my question. ;)
>>
>>> We do need to respect the grace period if we touch the globally visible
>>> data structure tp in tcf_destroy(). Therefore Roi's patch is not
>>> fixing the
>>> right place.
>>
>> I think there may be multiple issues actually.
>>
>> At the time we go into tc_classify(), from ingress as well as egress side,
>> we're under RCU, but BH variant. In cls delete()/destroy() callbacks, we
>> everywhere use call_rcu() and kfree_rcu(), same as for tcf_destroy() where
>> we use kfree_rcu() on tp, although we iterate tps (and implicitly inner
>> filters)
>> via rcu_dereference_bh() from reader side. Is there a reason why we don't
>> use call_rcu_bh() variant on destruction for all this instead?
>
> I can't think of any if its all under _bh we can convert the call_rcu to
> call_rcu_bh it just needs an audit.
>
>>
>> Just looking at cls_bpf and others, what protects
>> RCU_INIT_POINTER(tp->root,
>> NULL) against? The tp is unlinked in tc_ctl_tfilter() from the tp chain in
>> tcf_destroy() cases. Still active readers under RCU BH can race against
>> this
>> (tp->root being NULL), as the commit identified. Only the get() callback
>> checks
>> for head against NULL, but both are serialized under rtnl, and the only
>> place
>> we call this is tc_ctl_tfilter(). Even if we create a new tp, head
>> should not
>> be NULL there, if it was assigned during the init() cb, but contains an
>> empty
>> list. (It's different for things like cls_cgroup, though.) So, I'm
>> wondering
>> if the RCU_INIT_POINTER(tp->root, NULL) can just be removed instead
>> (unless I'm
>> missing something obvious)?
>
>
> Just took a look at this I think there are a couple possible solutions.
> The easiest is likely to fix all the call sites so that 'tp' is unlinked
> before calling the destroy() handlers AND not doing the NULL set. I only
> see one such call site where destroy is called before unlinking at the
> moment. This should enforce that after a grace period there is no path
> to reach the classifiers because 'tp' is unlinked. Calling destroy
> before unlinking 'tp' however could cause a small race between grace
> period of 'tp' and grace period of the filter.
>
> Another would be to only call the destroy path from the call_rcu path
> of the 'tp' object so that destroy is only ever called after the object
> is guaranteed to be unlinked from the tc_filter path.
>
> I think both solutions would be fine.
>
> Cong were you working on one of these? Or do you have another idea?

Yeah, this is basic what I think as well, however, both are hard.
On one hand, we can't detach the tp from the global singly-linked list
before tcf_destroy() since we rely on its return value to make this decision.
On the other hand, it is a singly-linked list, we have to pass in the address
of its previous pointer to rcu callback to remove it, it seems racy as well
since we modify a previous pointer which is still visible globally...

Hmm, perhaps we really have to switch to a doubly-linked list, that is
list_head. I need to double check. And also the semantic of ->destroy()
needs to revise too.

So yeah, my commit should be blamed. :-/