[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55AE1939.105@mojatatu.com>
Date: Tue, 21 Jul 2015 06:04:41 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Alex Gartrell <agartrell@...com>, xiyou.wangcong@...il.com,
davem@...emloft.net
CC: netdev@...r.kernel.org, eric.dumazet@...il.com, kernel-team@...com,
stable@...r.kernel.org
Subject: Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On 07/20/15 15:40, Alex Gartrell wrote:
> We have an application that invokes tc to delete the root every time the
> config changes. As a result we stress the cleanup code and were seeing the
> following panic:
>
> crash> bt
> PID: 630839 TASK: ffff8823c990d280 CPU: 14 COMMAND: "tc"
> [... snip ...]
> #8 [ffff8820ceec17a0] page_fault at ffffffff8160a8c2
> [exception RIP: htb_qlen_notify+24]
> RIP: ffffffffa0841718 RSP: ffff8820ceec1858 RFLAGS: 00010282
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88241747b400
> RDX: ffff88241747b408 RSI: 0000000000000000 RDI: ffff8811fb27d000
> RBP: ffff8820ceec1868 R8: ffff88120cdeff24 R9: ffff88120cdeff30
> R10: 0000000000000bd4 R11: ffffffffa0840919 R12: ffffffffa0843340
> R13: 0000000000000000 R14: 0000000000000001 R15: ffff8808dae5c2e8
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [...] qdisc_tree_decrease_qlen at ffffffff81565375
> #10 [...] fq_codel_dequeue at ffffffffa084e0a0 [sch_fq_codel]
> #11 [...] fq_codel_reset at ffffffffa084e2f8 [sch_fq_codel]
> #12 [...] qdisc_destroy at ffffffff81560d2d
> #13 [...] htb_destroy_class at ffffffffa08408f8 [sch_htb]
> #14 [...] htb_put at ffffffffa084095c [sch_htb]
> #15 [...] tc_ctl_tclass at ffffffff815645a3
> #16 [...] rtnetlink_rcv_msg at ffffffff81552cb0
> [... snip ...]
>
> To my understanding, the following situation is taking place.
>
> tc_ctl_tclass
> -> htb_delete
> -> class is deleted from clhash
> -> htb_put
> -> qdisc_destroy
> -> fq_codel_reset
=========> this part looks suspicious. Why is reset invoking
a dequeue? Shouldnt a destroy just purge the queue?
> -> fq_codel_dequeue
> -> qdidsc_tree_decrease_qlen
> -> cl = htb_get # returns NULL, removed in htb_delete
> -> htb_qlen_notify(sch, NULL) # BOOM
>
It is worrisome to fix the core code for this. The root cause seems to
be codel. Dont have time but in general, reset would be something like:
struct fq_codel_sched_data *q = qdisc_priv(sch);
qdisc_reset(q)
or something along those lines...
But certainly dequeue semantics dont seem right there..
cheers,
jamal
cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists