[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<MEYP282MB2312EE60BC5A38AEB4D77BA9C6372@MEYP282MB2312.AUSP282.PROD.OUTLOOK.COM>
Date: Wed, 4 Dec 2024 14:49:09 +0800
From: Levi Zim <rsworktech@...look.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: John Fastabend <john.fastabend@...il.com>,
Jakub Sitnicki <jakub@...udflare.com>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, David Ahern <dsahern@...nel.org>,
netdev@...r.kernel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 0/2] Fix NPE discovered by running bpf kselftest
On 2024-12-04 09:01, Cong Wang wrote:
> On Sun, Dec 01, 2024 at 09:42:08AM +0800, Levi Zim wrote:
>> On 2024-11-30 21:38, Levi Zim via B4 Relay wrote:
>>> I found that bpf kselftest sockhash::test_txmsg_cork_hangs in
>>> test_sockmap.c triggers a kernel NULL pointer dereference:
> Interesting, I also ran this test recently and I didn't see such a
> crash.
I am also curious about why other people or the CI didn't hit such crash.
I just did a search and find only one mention of this bug:
https://lore.kernel.org/bpf/20241020110345.1468595-1-zijianzhang@bytedance.com/
Personally when trying to run test_sockmap on Arch Linux 6.12.1 kernel,
I get rcu stall instead of this NPE:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-11): P3378
rcu: (detected by 0, t=18002 jiffies, g=9525, q=28619 ncpus=12)
task:test_sockmap state:R running task stack:0 pid:3378
tgid:3378 ppid:1168 flags:0x00004006
Call Trace:
<TASK>
? __schedule+0x3b8/0x12b0
? get_page_from_freelist+0x366/0x1730
? sysvec_apic_timer_interrupt+0xe/0x90
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? bpf_msg_pop_data+0x41e/0x690
? mem_cgroup_charge_skmem+0x40/0x60
? bpf_prog_1fca1a523ce93f38_bpf_prog4+0x23d/0x248
? sk_psock_msg_verdict+0x99/0x1e0
? tcp_bpf_sendmsg+0x42d/0x9f0
? sock_sendmsg+0x109/0x130
? splice_to_socket+0x359/0x4f0
? shmem_file_splice_read+0x2cd/0x300
? direct_splice_actor+0x51/0x130
? splice_direct_to_actor+0xf0/0x260
? __pfx_direct_splice_actor+0x10/0x10
? do_splice_direct+0x77/0xc0
? __pfx_direct_file_splice_eof+0x10/0x10
? do_sendfile+0x382/0x440
? __x64_sys_sendfile64+0xb3/0xd0
? do_syscall_64+0x82/0x190
? find_next_iomem_res+0xbe/0x130
? __pfx_pagerange_is_ram_callback+0x10/0x10
? walk_system_ram_range+0xa6/0x100
? __pte_offset_map+0x1b/0x180
? __pte_offset_map_lock+0x9e/0x130
? set_ptes.isra.0+0x41/0x90
? insert_pfn+0xba/0x210
? vmf_insert_pfn_prot+0x85/0xd0
? __do_fault+0x30/0x170
? do_fault+0x303/0x4c0
? __handle_mm_fault+0x7c2/0xfa0
? shmem_file_write_iter+0x5b/0x90
? __count_memcg_events+0x53/0xf0
? count_memcg_events.constprop.0+0x1a/0x30
? handle_mm_fault+0x1bb/0x2c0
? do_user_addr_fault+0x17f/0x620
? clear_bhb_loop+0x25/0x80
? clear_bhb_loop+0x25/0x80
? clear_bhb_loop+0x25/0x80
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> ? __die_body+0x6e/0xb0
>>> ? __die+0x8b/0xa0
>>> ? page_fault_oops+0x358/0x3c0
>>> ? local_clock+0x19/0x30
>>> ? lock_release+0x11b/0x440
>>> ? kernelmode_fixup_or_oops+0x54/0x60
>>> ? __bad_area_nosemaphore+0x4f/0x210
>>> ? mmap_read_unlock+0x13/0x30
>>> ? bad_area_nosemaphore+0x16/0x20
>>> ? do_user_addr_fault+0x6fd/0x740
>>> ? prb_read_valid+0x1d/0x30
>>> ? exc_page_fault+0x55/0xd0
>>> ? asm_exc_page_fault+0x2b/0x30
>>> ? splice_to_socket+0x52e/0x630
>>> ? shmem_file_splice_read+0x2b1/0x310
>>> direct_splice_actor+0x47/0x70
>>> splice_direct_to_actor+0x133/0x300
>>> ? do_splice_direct+0x90/0x90
>>> do_splice_direct+0x64/0x90
>>> ? __ia32_sys_tee+0x30/0x30
>>> do_sendfile+0x214/0x300
>>> __se_sys_sendfile64+0x8e/0xb0
>>> __x64_sys_sendfile64+0x25/0x30
>>> x64_sys_call+0xb82/0x2840
>>> do_syscall_64+0x75/0x110
>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>
>>> This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than
>>> size(8192), which causes the while loop in splice_to_socket() to release
>>> an uninitialized pipe buf.
>>>
>>> The underlying cause is that this code assumes sk_msg_memcopy_from_iter()
>>> will copy all bytes upon success but it actually might only copy part of
>>> it.
>> I am not sure what Fixes tag I should put. Git blame leads me to a refactor
>> commit
>> and I am not familiar with this part of code base. Any suggestions?
> I think it is the following commit which introduced memcopy_from_iter()
> (which was renamed to sk_msg_memcopy_from_iter() later):
>
> commit 4f738adba30a7cfc006f605707e7aee847ffefa0
> Author: John Fastabend <john.fastabend@...il.com>
> Date: Sun Mar 18 12:57:10 2018 -0700
>
> bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
>
> Please double check.
>
> Thanks.
Thanks for your help. I will double check it.
Powered by blists - more mailing lists