lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Apr 2017 19:01:29 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org
CC:     John Fastabend <john.fastabend@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>, lucasb@...atatu.com
Subject: Re: [Patch net-next v3] net_sched: move the empty tp check from ->destroy()
 to ->delete()

Hi Cong,

sorry for the late reply. Generally the patch looks good to me, just
a few comments inline:

On 04/17/2017 08:30 PM, Cong Wang wrote:
> Roi reported we could have a race condition where in ->classify() path
> we dereference tp->root and meanwhile a parallel ->destroy() makes it
> a NULL.

Correct.

> This is possible because ->destroy() could be called when deleting
> a filter to check if we are the last one in tp, this tp is still
> linked and visible at that time.
>
> Daniel fixed this in commit d936377414fa
> ("net, sched: respect rcu grace period on cls destruction"), but
> the root cause of this problem is the semantic of ->destroy(), it
> does two things (for non-force case):
>
> 1) check if tp is empty
> 2) if tp is empty we could really destroy it
>
> and its caller, if cares, needs to check its return value to see if
> it is really destroyed. Therefore we can't unlink tp unless we know
> it is empty.
>
> As suggested by Daniel, we could actually move the test logic to ->delete()
> so that we can safely unlink tp after ->delete() tells us the last one is
> just deleted and before ->destroy().
>
> What's more, even we unlink it before ->destroy(), it could still have
> readers since we don't wait for a grace period here, we should not modify
> tp->root in ->destroy() either.

Here seems to be a bit of a mixup in this analysis, imo. The issue
that Roi reported back then was exactly the one that d936377414fa ("net,
sched: respect rcu grace period on cls destruction") fixed, which
affected flower and other classifiers:

     Roi reported a crash in flower where tp->root was NULL in ->classify()
     callbacks. Reason is that in ->destroy() tp->root is set to NULL via
     RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
     this doesn't respect RCU grace period for them, and as a result, still
     outstanding readers from tc_classify() will try to blindly dereference
     a NULL tp->root.

The ->delete() callback was never used by Roi back then, he said that
he just removed the ingress qdisc in his test, which implicitly purges
all cls attached to it via tcf_destroy_chain(). So the above description
with regards to the "root cause" of Roi's reported issue is not correct.

The issue that this patch fixes is an _independent_ race that we found
while auditing the code when looking into Roi's report back then. It
fixes commit 1e052be69d04 ("net_sched: destroy proto tp when all filters
are gone"), which added the RCU_INIT_POINTER() after the tcf_destroy() in
RTM_DELTFILTER case. That part of the description looks good, where you
describe that "[...] we need to move the test logic to ->delete(), so
that we can safely unlink tp after ->delete() tells us the last one is
just deleted and before ->destroy()."

Please also add Fixes tag, so it can be better tracked for backports.

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")

> Reported-by: Roi Dayan <roid@...lanox.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: John Fastabend <john.fastabend@...il.com>
> Cc: Jamal Hadi Salim <jhs@...atatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
[...]

> diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
> index 9962090..d388536 100644
> --- a/net/sched/cls_fw.c
> +++ b/net/sched/cls_fw.c
[...]
> @@ -150,17 +144,17 @@ static bool fw_destroy(struct tcf_proto *tp, bool force)
>   			call_rcu(&f->rcu, fw_delete_filter);
>   		}
>   	}
> -	RCU_INIT_POINTER(tp->root, NULL);
>   	kfree_rcu(head, rcu);
> -	return true;
>   }

> diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
> index a371075..8e2baf8 100644
> --- a/net/sched/cls_route.c
> +++ b/net/sched/cls_route.c
> @@ -312,12 +305,10 @@ static bool route4_destroy(struct tcf_proto *tp, bool force)
>   			kfree_rcu(b, rcu);
>   		}
>   	}
> -	RCU_INIT_POINTER(tp->root, NULL);
>   	kfree_rcu(head, rcu);
> -	return true;
>   }

> diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
> index d7f2923..9e3748b 100644
> --- a/net/sched/cls_rsvp.h
> +++ b/net/sched/cls_rsvp.h
> @@ -302,22 +302,13 @@ static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
[...]
> -	}
> -
> -	RCU_INIT_POINTER(tp->root, NULL);
> +		return;
>
>   	for (h1 = 0; h1 < 256; h1++) {
>   		struct rsvp_session *s;
> @@ -337,10 +328,9 @@ static bool rsvp_destroy(struct tcf_proto *tp, bool force)
[...]

The above three RCU_INIT_POINTER(tp->root, NULL) are independent
of the fix and actually do no harm right now. I described that in
d936377414fa ("net, sched: respect rcu grace period on cls destruction")
as well, meaning that they each handle tp->root being NULL in ->classify()
path (for historic reasons), so this is handled gracefully, readers use
rcu_dereference_bh(tp->root) and test for this being NULL.

But I agree that this could be cleaned up along with the check in the
->classify() callbacks for these three (not sure if really worth it,
though). However, such cleanup should be a separate patch and not
included in this fix.

Other than the mentioned things, this looks good to me.

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ