netdev - Re: [PATCH net 0/2] Fix NPE discovered by running bpf kselftest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <MEYP282MB2312EE60BC5A38AEB4D77BA9C6372@MEYP282MB2312.AUSP282.PROD.OUTLOOK.COM>
Date: Wed, 4 Dec 2024 14:49:09 +0800
From: Levi Zim <rsworktech@...look.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: John Fastabend <john.fastabend@...il.com>,
 Jakub Sitnicki <jakub@...udflare.com>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, David Ahern <dsahern@...nel.org>,
 netdev@...r.kernel.org, bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 0/2] Fix NPE discovered by running bpf kselftest

On 2024-12-04 09:01, Cong Wang wrote:
> On Sun, Dec 01, 2024 at 09:42:08AM +0800, Levi Zim wrote:
>> On 2024-11-30 21:38, Levi Zim via B4 Relay wrote:
>>> I found that bpf kselftest sockhash::test_txmsg_cork_hangs in
>>> test_sockmap.c triggers a kernel NULL pointer dereference:
> Interesting, I also ran this test recently and I didn't see such a
> crash.

I am also curious about why other people or the CI didn't hit such crash.

I just did a search and find only one mention of this bug:
https://lore.kernel.org/bpf/20241020110345.1468595-1-zijianzhang@bytedance.com/

Personally when trying to run test_sockmap on Arch Linux 6.12.1 kernel, 
I get rcu stall instead of this NPE:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:         Tasks blocked on level-0 rcu_node (CPUs 0-11): P3378
rcu:         (detected by 0, t=18002 jiffies, g=9525, q=28619 ncpus=12)
task:test_sockmap    state:R  running task     stack:0 pid:3378  
tgid:3378  ppid:1168   flags:0x00004006
Call Trace:
  <TASK>
  ? __schedule+0x3b8/0x12b0
  ? get_page_from_freelist+0x366/0x1730
  ? sysvec_apic_timer_interrupt+0xe/0x90
  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
  ? bpf_msg_pop_data+0x41e/0x690
  ? mem_cgroup_charge_skmem+0x40/0x60
  ? bpf_prog_1fca1a523ce93f38_bpf_prog4+0x23d/0x248
  ? sk_psock_msg_verdict+0x99/0x1e0
  ? tcp_bpf_sendmsg+0x42d/0x9f0
  ? sock_sendmsg+0x109/0x130
  ? splice_to_socket+0x359/0x4f0
  ? shmem_file_splice_read+0x2cd/0x300
  ? direct_splice_actor+0x51/0x130
  ? splice_direct_to_actor+0xf0/0x260
  ? __pfx_direct_splice_actor+0x10/0x10
  ? do_splice_direct+0x77/0xc0
  ? __pfx_direct_file_splice_eof+0x10/0x10
  ? do_sendfile+0x382/0x440
  ? __x64_sys_sendfile64+0xb3/0xd0
  ? do_syscall_64+0x82/0x190
  ? find_next_iomem_res+0xbe/0x130
  ? __pfx_pagerange_is_ram_callback+0x10/0x10
  ? walk_system_ram_range+0xa6/0x100
  ? __pte_offset_map+0x1b/0x180
  ? __pte_offset_map_lock+0x9e/0x130
  ? set_ptes.isra.0+0x41/0x90
  ? insert_pfn+0xba/0x210
  ? vmf_insert_pfn_prot+0x85/0xd0
  ? __do_fault+0x30/0x170
  ? do_fault+0x303/0x4c0
  ? __handle_mm_fault+0x7c2/0xfa0
  ? shmem_file_write_iter+0x5b/0x90
  ? __count_memcg_events+0x53/0xf0
  ? count_memcg_events.constprop.0+0x1a/0x30
  ? handle_mm_fault+0x1bb/0x2c0
  ? do_user_addr_fault+0x17f/0x620
  ? clear_bhb_loop+0x25/0x80
  ? clear_bhb_loop+0x25/0x80
  ? clear_bhb_loop+0x25/0x80
  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
  </TASK>

>>> BUG: kernel NULL pointer dereference, address: 0000000000000008
>>>    ? __die_body+0x6e/0xb0
>>>    ? __die+0x8b/0xa0
>>>    ? page_fault_oops+0x358/0x3c0
>>>    ? local_clock+0x19/0x30
>>>    ? lock_release+0x11b/0x440
>>>    ? kernelmode_fixup_or_oops+0x54/0x60
>>>    ? __bad_area_nosemaphore+0x4f/0x210
>>>    ? mmap_read_unlock+0x13/0x30
>>>    ? bad_area_nosemaphore+0x16/0x20
>>>    ? do_user_addr_fault+0x6fd/0x740
>>>    ? prb_read_valid+0x1d/0x30
>>>    ? exc_page_fault+0x55/0xd0
>>>    ? asm_exc_page_fault+0x2b/0x30
>>>    ? splice_to_socket+0x52e/0x630
>>>    ? shmem_file_splice_read+0x2b1/0x310
>>>    direct_splice_actor+0x47/0x70
>>>    splice_direct_to_actor+0x133/0x300
>>>    ? do_splice_direct+0x90/0x90
>>>    do_splice_direct+0x64/0x90
>>>    ? __ia32_sys_tee+0x30/0x30
>>>    do_sendfile+0x214/0x300
>>>    __se_sys_sendfile64+0x8e/0xb0
>>>    __x64_sys_sendfile64+0x25/0x30
>>>    x64_sys_call+0xb82/0x2840
>>>    do_syscall_64+0x75/0x110
>>>    entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>
>>> This is caused by tcp_bpf_sendmsg() returning a larger value(12289) than
>>> size(8192), which causes the while loop in splice_to_socket() to release
>>> an uninitialized pipe buf.
>>>
>>> The underlying cause is that this code assumes sk_msg_memcopy_from_iter()
>>> will copy all bytes upon success but it actually might only copy part of
>>> it.
>> I am not sure what Fixes tag I should put. Git blame leads me to a refactor
>> commit
>> and I am not familiar with this part of code base. Any suggestions?
> I think it is the following commit which introduced memcopy_from_iter()
> (which was renamed to sk_msg_memcopy_from_iter() later):
>
> commit 4f738adba30a7cfc006f605707e7aee847ffefa0
> Author: John Fastabend <john.fastabend@...il.com>
> Date:   Sun Mar 18 12:57:10 2018 -0700
>
>      bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
>
> Please double check.
>
> Thanks.
Thanks for your help. I will double check it.