lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8ab4d343-d287-4b42-94f7-511f46e131d3@huawei.com>
Date: Fri, 26 Sep 2025 16:42:35 +0800
From: Wang Liang <wangliang74@...wei.com>
To: Kuniyuki Iwashima <kuniyu@...gle.com>, Eric Dumazet <edumazet@...gle.com>
CC: <alibuda@...ux.alibaba.com>, <dust.li@...ux.alibaba.com>,
	<sidraya@...ux.ibm.com>, <wenjia@...ux.ibm.com>, <mjambigi@...ux.ibm.com>,
	<tonylu@...ux.alibaba.com>, <guwen@...ux.alibaba.com>, <davem@...emloft.net>,
	<kuba@...nel.org>, <pabeni@...hat.com>, <horms@...nel.org>,
	<yuehaibing@...wei.com>, <zhangchangzhong@...wei.com>,
	<linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
	<linux-rdma@...r.kernel.org>, <linux-s390@...r.kernel.org>
Subject: Re: [PATCH net] net/smc: fix general protection fault in
 __smc_diag_dump


在 2025/9/26 3:51, Kuniyuki Iwashima 写道:
> On Thu, Sep 25, 2025 at 12:37 PM Eric Dumazet <edumazet@...gle.com> wrote:
>> On Thu, Sep 25, 2025 at 12:25 PM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>>> On Thu, Sep 25, 2025 at 11:54 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>>> On Thu, Sep 25, 2025 at 11:46 AM Kuniyuki Iwashima <kuniyu@...gle.com> wrote:
>>>>> Thanks Eric for CCing me.
>>>>>
>>>>> On Thu, Sep 25, 2025 at 7:32 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>>>>> On Mon, Sep 22, 2025 at 4:57 AM Wang Liang <wangliang74@...wei.com> wrote:
>>>>>>> The syzbot report a crash:
>>>>>>>
>>>>>>>    Oops: general protection fault, probably for non-canonical address 0xfbd5a5d5a0000003: 0000 [#1] SMP KASAN NOPTI
>>>>>>>    KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f]
>>>>>>>    CPU: 1 UID: 0 PID: 6949 Comm: syz.0.335 Not tainted syzkaller #0 PREEMPT(full)
>>>>>>>    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025
>>>>>>>    RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline]
>>>>>>>    RIP: 0010:__smc_diag_dump.constprop.0+0x3ca/0x2550 net/smc/smc_diag.c:89
>>>>>>>    Call Trace:
>>>>>>>     <TASK>
>>>>>>>     smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217
>>>>>>>     smc_diag_dump+0x27/0x90 net/smc/smc_diag.c:234
>>>>>>>     netlink_dump+0x539/0xd30 net/netlink/af_netlink.c:2327
>>>>>>>     __netlink_dump_start+0x6d6/0x990 net/netlink/af_netlink.c:2442
>>>>>>>     netlink_dump_start include/linux/netlink.h:341 [inline]
>>>>>>>     smc_diag_handler_dump+0x1f9/0x240 net/smc/smc_diag.c:251
>>>>>>>     __sock_diag_cmd net/core/sock_diag.c:249 [inline]
>>>>>>>     sock_diag_rcv_msg+0x438/0x790 net/core/sock_diag.c:285
>>>>>>>     netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552
>>>>>>>     netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline]
>>>>>>>     netlink_unicast+0x5a7/0x870 net/netlink/af_netlink.c:1346
>>>>>>>     netlink_sendmsg+0x8d1/0xdd0 net/netlink/af_netlink.c:1896
>>>>>>>     sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>     __sock_sendmsg net/socket.c:729 [inline]
>>>>>>>     ____sys_sendmsg+0xa95/0xc70 net/socket.c:2614
>>>>>>>     ___sys_sendmsg+0x134/0x1d0 net/socket.c:2668
>>>>>>>     __sys_sendmsg+0x16d/0x220 net/socket.c:2700
>>>>>>>     do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>>>>>>>     do_syscall_64+0xcd/0x4e0 arch/x86/entry/syscall_64.c:94
>>>>>>>     entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>>>>>>     </TASK>
>>>>>>>
>>>>>>> The process like this:
>>>>>>>
>>>>>>>                 (CPU1)              |             (CPU2)
>>>>>>>    ---------------------------------|-------------------------------
>>>>>>>    inet_create()                    |
>>>>>>>      // init clcsock to NULL        |
>>>>>>>      sk = sk_alloc()                |
>>>>>>>                                     |
>>>>>>>      // unexpectedly change clcsock |
>>>>>>>      inet_init_csk_locks()          |
>>>>>>>                                     |
>>>>>>>      // add sk to hash table        |
>>>>>>>      smc_inet_init_sock()           |
>>>>>>>        smc_sk_init()                |
>>>>>>>          smc_hash_sk()              |
>>>>>>>                                     | // traverse the hash table
>>>>>>>                                     | smc_diag_dump_proto
>>>>>>>                                     |   __smc_diag_dump()
>>>>>>>                                     |     // visit wrong clcsock
>>>>>>>                                     |     smc_diag_msg_common_fill()
>>>>>>>      // alloc clcsock               |
>>>>>>>      smc_create_clcsk               |
>>>>>>>        sock_create_kern             |
>>>>>>>
>>>>>>> With CONFIG_DEBUG_LOCK_ALLOC=y, the smc->clcsock is unexpectedly changed
>>>>>>> in inet_init_csk_locks(), because the struct smc_sock does not have struct
>>>>>>> inet_connection_sock as the first member.
>>>>>>>
>>>>>>> Previous commit 60ada4fe644e ("smc: Fix various oops due to inet_sock type
>>>>>>> confusion.") add inet_sock as the first member of smc_sock. For protocol
>>>>>>> with INET_PROTOSW_ICSK, use inet_connection_sock instead of inet_sock is
>>>>>>> more appropriate.
>>>>> Why is INET_PROTOSW_ICSK necessary in the first place ?
>>>>>
>>>>> I don't see a clear reason because smc_clcsock_accept() allocates
>>>>> a new sock by smc_sock_alloc() and does not use inet_accept().
>>>>>
>>>>> Or is there any other path where smc_sock is cast to
>>>>> inet_connection_sock ?
>>>> What I saw in this code was a missing protection.
>>>>
>>>> smc_diag_msg_common_fill() runs without socket lock being held.
>>>>
>>>> I was thinking of this fix, but apparently syzbot still got crashes.
>>> Looking at the test result,
>>>
>>> https://syzkaller.appspot.com/x/report.txt?x=15944c7c580000
>>> KASAN: maybe wild-memory-access in range [0xdead4ead00000018-0xdead4ead0000001f]
>>>
>>> the top half of the address is SPINLOCK_MAGIC (0xdead4ead),
>>> so the type confusion mentioned in the commit message makes
>>> sense to me.
>>>
>>> $ pahole -C inet_connection_sock vmlinux
>>> struct inet_connection_sock {
>>> ...
>>>      struct request_sock_queue  icsk_accept_queue;    /*   992    80 */
>>>
>>> $ pahole -C smc_sock vmlinux
>>> struct smc_sock {
>>> ...
>>>      struct socket *            clcsock;              /*   992     8 */
>>>
>>> The option is 1) let inet_init_csk_locks() init inet_connection_sock
>>> or 2) avoid inet_init_csk_locks(), and I guess 2) could be better to
>>> avoid potential issues in IS_ICSK branches.
>>>
>> I definitely vote to remove INET_PROTOSW_ICSK from smc.
>>
>> We want to reserve inet_connection_sock to TCP only, so that we can
>> move fields to better
>> cache friendly locations in tcp_sock hopefully for linux-6.19
> Fully agreed.
>
> Wang: please squash the revert of 6fd27ea183c2 for
> INET_PROTOSW_ICSK removal.  This is for one of
> IS_ICSK branches.


Thanks for your suggestions, they are helpful!

I will remove INET_PROTOSW_ICSK from smc_inet_protosw and smc_inet6_protosw,

and revert 6fd27ea183c2 ("net/smc: fix lacks of icsk_syn_mss with 
IPPROTO_SMC")

in one patchset later.

------
Best regards
Wang Liang




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ