netdev - Re: [Patch net-next] net_sched: move the empty tp check from ->destroy() to ->delete()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <58370558.9070004@iogearbox.net>
Date:   Thu, 24 Nov 2016 16:20:56 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Roi Dayan <roid@...lanox.com>,
        Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org
CC:     jiri@...lanox.com, John Fastabend <john.fastabend@...il.com>
Subject: Re: [Patch net-next] net_sched: move the empty tp check from ->destroy()
 to ->delete()

On 11/24/2016 12:01 PM, Roi Dayan wrote:
> On 24/11/2016 12:14, Daniel Borkmann wrote:
>> On 11/24/2016 09:29 AM, Roi Dayan wrote:
>>> Hi,
>>>
>>> I'm testing this patch with KASAN enabled and got into a new kernel crash I didn't hit before.
>>>
>>> [ 1860.725065] ==================================================================
>>> [ 1860.733893] BUG: KASAN: use-after-free in __netif_receive_skb_core+0x1ebe/0x29a0 at addr ffff880a68b04028
>>> [ 1860.745415] Read of size 8 by task CPU 0/KVM/5334
>>> [ 1860.751368] CPU: 8 PID: 5334 Comm: CPU 0/KVM Tainted: G O 4.9.0-rc3+ #18

(Btw, your kernel is tainted with o-o-tree module? Anything relevant?)

>>> [ 1860.760547] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 07/01/2015
>>> [ 1860.768036] Call Trace:
>>> [ 1860.771307]  [<ffffffffa9b6dc42>] dump_stack+0x63/0x81
>>> [ 1860.777167]  [<ffffffffa95fb751>] kasan_object_err+0x21/0x70
>>> [ 1860.783826]  [<ffffffffa95fb9dd>] kasan_report_error+0x1ed/0x4e0
>>> [ 1860.790640]  [<ffffffffa9b9b841>] ? csum_partial+0x11/0x20
>>> [ 1860.796871]  [<ffffffffaa44a6b9>] ? csum_partial_ext+0x9/0x10
>>> [ 1860.803571]  [<ffffffffaa453155>] ? __skb_checksum+0x115/0x8d0
>>> [ 1860.810370]  [<ffffffffa95fbe81>] __asan_report_load8_noabort+0x61/0x70
>>> [ 1860.818263]  [<ffffffffaa49c3fe>] ? __netif_receive_skb_core+0x1ebe/0x29a0
>>> [ 1860.826215]  [<ffffffffaa49c3fe>] __netif_receive_skb_core+0x1ebe/0x29a0
>>> [ 1860.833991]  [<ffffffffaa49a540>] ? netdev_info+0x100/0x100
>>> [ 1860.840529]  [<ffffffffaa671792>] ? udp4_gro_receive+0x802/0x1090
>>> [ 1860.847783]  [<ffffffffa9bb9a08>] ? find_next_bit+0x18/0x20
>>> [ 1860.854126]  [<ffffffffaa49cf04>] __netif_receive_skb+0x24/0x150
>>> [ 1860.861695]  [<ffffffffaa49d0d1>] netif_receive_skb_internal+0xa1/0x1d0
>>> [ 1860.869366]  [<ffffffffaa49d030>] ? __netif_receive_skb+0x150/0x150
>>> [ 1860.876464]  [<ffffffffaa49f7e9>] ? dev_gro_receive+0x969/0x1660
>>> [ 1860.883924]  [<ffffffffaa4a0e1f>] napi_gro_receive+0x1df/0x300
>>> [ 1860.890744]  [<ffffffffc02e885d>] mlx5e_handle_rx_cqe_rep+0x83d/0xd30 [mlx5_core]
>>>
>>> checking with gdb
>>>
>>> (gdb) l *(__netif_receive_skb_core+0x1ebe)
>>> 0xffffffff8249c3fe is in __netif_receive_skb_core (net/core/dev.c:3937).
>>> 3932                    *pt_prev = NULL;
>>> 3933            }
>>> 3934
>>> 3935            qdisc_skb_cb(skb)->pkt_len = skb->len;
>>> 3936            skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
>>> 3937            qdisc_bstats_cpu_update(cl->q, skb);
>>> 3938
>>> 3939            switch (tc_classify(skb, cl, &cl_res, false)) {
>>> 3940            case TC_ACT_OK:
>>> 3941            case TC_ACT_RECLASSIFY:
>>
>> Can you elaborate some more on your test-case? Adding/dropping ingress qdisc with
>> some classifier on it in a loop while traffic goes through?
>
> I first delete the qdisc ingress from the relevant interface
> I start traffic on it then I add the qdisc ingress to the relevant interface and start adding tc flower rules to match the traffic.

Ok, strange, qdisc_destroy() calls into ops->destroy(), where ingress
drops its entire chain via tcf_destroy_chain(), so that will be NULL
eventually. The tps are freed by call_rcu() as well as qdisc itself
later on via qdisc_rcu_free(), where it frees per-cpu bstats as well.
Outstanding readers should either bail out due to if (!cl) or can still
process the chain until read section ends, but during that time, cl->q
resp. bstats should be good. Do you happen to know what's at address
ffff880a68b04028? I was wondering wrt call_rcu() vs call_rcu_bh(), but
at least on ingress (netif_receive_skb_internal()) we hold rcu_read_lock()
here. The KASAN report is reliably happening at this location, right?