[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19b18a7c-ed1c-1f9d-84d4-7046bffe46b9@gmail.com>
Date: Wed, 25 Jan 2023 11:07:32 +0900
From: Toshiaki Makita <toshiaki.makita1@...il.com>
To: Toke Høiland-Jørgensen <toke@...nel.org>,
Shawn Bohrer <sbohrer@...udflare.com>
Cc: makita.toshiaki@....ntt.co.jp, netdev@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: KASAN veth use after free in XDP_REDIRECT
On 2023/01/25 10:54, Toke Høiland-Jørgensen wrote:
> Shawn Bohrer <sbohrer@...udflare.com> writes:
>
>> Hello,
>>
>> We've seen the following KASAN report on our systems. When using
>> AF_XDP on a veth.
>>
>> KASAN report:
>>
>> BUG: KASAN: use-after-free in __xsk_rcv+0x18d/0x2c0
>> Read of size 78 at addr ffff888976250154 by task napi/iconduit-g/148640
>>
>> CPU: 5 PID: 148640 Comm: napi/iconduit-g Kdump: loaded Tainted: G O 6.1.4-cloudflare-kasan-2023.1.2 #1
>> Hardware name: Quanta Computer Inc. QuantaPlex T41S-2U/S2S-MB, BIOS S2S_3B10.03 06/21/2018
>> Call Trace:
>> <TASK>
>> dump_stack_lvl+0x34/0x48
>> print_report+0x170/0x473
>> ? __xsk_rcv+0x18d/0x2c0
>> kasan_report+0xad/0x130
>> ? __xsk_rcv+0x18d/0x2c0
>> kasan_check_range+0x149/0x1a0
>> memcpy+0x20/0x60
>> __xsk_rcv+0x18d/0x2c0
>> __xsk_map_redirect+0x1f3/0x490
>> ? veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>> xdp_do_redirect+0x5ca/0xd60
>> veth_xdp_rcv_skb+0x935/0x1ba0 [veth]
>> ? __netif_receive_skb_list_core+0x671/0x920
>> ? veth_xdp+0x670/0x670 [veth]
>> veth_xdp_rcv+0x304/0xa20 [veth]
>> ? do_xdp_generic+0x150/0x150
>> ? veth_xdp_rcv_one+0xde0/0xde0 [veth]
>> ? _raw_spin_lock_bh+0xe0/0xe0
>> ? newidle_balance+0x887/0xe30
>> ? __perf_event_task_sched_in+0xdb/0x800
>> veth_poll+0x139/0x571 [veth]
>> ? veth_xdp_rcv+0xa20/0xa20 [veth]
>> ? _raw_spin_unlock+0x39/0x70
>> ? finish_task_switch.isra.0+0x17e/0x7d0
>> ? __switch_to+0x5cf/0x1070
>> ? __schedule+0x95b/0x2640
>> ? io_schedule_timeout+0x160/0x160
>> __napi_poll+0xa1/0x440
>> napi_threaded_poll+0x3d1/0x460
>> ? __napi_poll+0x440/0x440
>> ? __kthread_parkme+0xc6/0x1f0
>> ? __napi_poll+0x440/0x440
>> kthread+0x2a2/0x340
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x22/0x30
>> </TASK>
>>
>> Freed by task 148640:
>> kasan_save_stack+0x23/0x50
>> kasan_set_track+0x21/0x30
>> kasan_save_free_info+0x2a/0x40
>> ____kasan_slab_free+0x169/0x1d0
>> slab_free_freelist_hook+0xd2/0x190
>> __kmem_cache_free+0x1a1/0x2f0
>> skb_release_data+0x449/0x600
>> consume_skb+0x9f/0x1c0
>> veth_xdp_rcv_skb+0x89c/0x1ba0 [veth]
>> veth_xdp_rcv+0x304/0xa20 [veth]
>> veth_poll+0x139/0x571 [veth]
>> __napi_poll+0xa1/0x440
>> napi_threaded_poll+0x3d1/0x460
>> kthread+0x2a2/0x340
>> ret_from_fork+0x22/0x30
>>
>>
>> The buggy address belongs to the object at ffff888976250000
>> which belongs to the cache kmalloc-2k of size 2048
>> The buggy address is located 340 bytes inside of
>> 2048-byte region [ffff888976250000, ffff888976250800)
>>
>> The buggy address belongs to the physical page:
>> page:00000000ae18262a refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x976250
>> head:00000000ae18262a order:3 compound_mapcount:0 compound_pincount:0
>> flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
>> raw: 002ffff800010200 0000000000000000 dead000000000122 ffff88810004cf00
>> raw: 0000000000000000 0000000080080008 00000002ffffffff 0000000000000000
>> page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>> ffff888976250000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888976250080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff888976250100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ^
>> ffff888976250180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888976250200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>
>>
>> If I understand the code correctly it looks like a xdp_buf is
>> constructed pointing to the memory backed by a skb but consume_skb()
>> is called while the xdp_buf() is still in use.
>>
>> ```
>> case XDP_REDIRECT:
>> veth_xdp_get(&xdp);
>> consume_skb(skb);
>> xdp.rxq->mem = rq->xdp_mem;
>> if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
>> stats->rx_drops++;
>> goto err_xdp;
>> }
>> stats->xdp_redirect++;
>> rcu_read_unlock();
>> goto xdp_xmit;
>> ```
>>
>> It is worth noting that I think XDP_TX has the exact same problem.
>>
>> Again assuming I understand the problem one naive solution might be to
>> move the consum_skb() call after xdp_do_redirect(). I think this
>> might work for BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_DEVMAP, and
>> BPF_MAP_TYPE_DEVMAP_HASH since those all seem to copy the xdb_buf to
>> new memory. The copy happens for XSKMAP in __xsk_rcv() and for the
>> DEVMAP cases happens in dev_map_enqueue_clone().
>>
>> However, it would appear that for BPF_MAP_TYPE_CPUMAP that memory can
>> live much longer, possibly even after xdp_do_flush(). If I'm correct,
>> I'm not really sure where it would be safe to call consume_skb().
>
> So the idea is that veth_xdp_get() does a
> get_page(virt_to_page(xdp->data)), where xdp->data in this case points
> to skb->head. This should keep the data page alive even if the skb
> surrounding it is freed by the call to consume_skb().
>
> However, because the skb->head in this case was allocated from a slab
> allocator, taking a page refcount is not enough to prevent it from being
> freed.
Not sure why skb->head is kmallocked here.
skb_head_is_locked() check in veth_convert_skb_to_xdp_buff() should ensure that
skb head is a page fragment.
Toshiaki Makita
Powered by blists - more mailing lists