netdev - Re: [syzbot] [can?] KCSAN: data-race in bcm_can_tx / bcm_tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b4306f50-08e4-d41d-1e59-5be1f9735dd6@hartkopp.net>
Date:   Tue, 25 Apr 2023 23:18:05 +0200
From:   Oliver Hartkopp <socketcan@...tkopp.net>
To:     Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+e1786f049e71693263bf@...kaller.appspotmail.com>
Cc:     davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        linux-can@...r.kernel.org, linux-kernel@...r.kernel.org,
        mkl@...gutronix.de, netdev@...r.kernel.org, pabeni@...hat.com,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [can?] KCSAN: data-race in bcm_can_tx / bcm_tx_setup (3)

Hello Dmitry,

On 25.04.23 10:36, Dmitry Vyukov wrote:
> On Tue, 25 Apr 2023 at 10:05, syzbot
> <syzbot+e1786f049e71693263bf@...kaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    1a0beef98b58 Merge tag 'tpmdd-v6.4-rc1' of git://git.kerne..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1485f1dbc80000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=501f7c86f7a05a13
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e1786f049e71693263bf
>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/f06c11683242/disk-1a0beef9.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/5c0a1cd5a059/vmlinux-1a0beef9.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/e4c318183ce3/bzImage-1a0beef9.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+e1786f049e71693263bf@...kaller.appspotmail.com
> 
> op->currframe and probably other op fields are concurrently
> read/modified by both bcm_tx_setup() and bcm_can_tx().
> If I am reading the code correctly, it can lead to a wide range of
> misbehavior, e.g. sending wrong/uninit data, reading/writing data
> out-of-bounds, etc.
> I think these functions need to be somehow serialized (stopping timers
> before doing any modifications to op?).

KCSAN has detected a very special case here:

The content of the CAN frames (in a running tx-job) has been altered and 
the number of CAN frames has been reduced. (Increasing if the number of 
CAN frames is not possible with an active tx-job/running hrtimer).

Or (alternatively) the TX_RESET_MULTI_IDX flag has been set.

In both cases op->currframe is set to zero to start the sequence of the 
CAN frames in op->frames in the next(!) hrtimer execution.

So setting values in op->currframe to zero (as pointed out by KCSAN) is 
always a good move.

When there would be a race between the op->currframe++ in bcm_can_tx() 
and the test for
if (op->nframes != msg_head->nframes) in bcm_tx_setup() this would be 
fixed with
if (op->currframe >= op->nframes) in bcm_can_tx().

But looking at the code again I'm not sure if we might /potentially/ 
lose the TX_RESET_MULTI_IDX feature when the unlocked op->currframe++ is 
performed concurrently in bcm_can_tx().

So a short local locking around the op->currframe r/w operations in 
bcm_can_tx() and bcm_tx_setup() would make sense IMO.

The code is intended to update CAN frame content (with a fixed 
non-increasing length) lock-less on the fly and there should be no other 
"wide range of misbehavior" cases here.

I will take a look and send a patch for the op->currframe locking.

Many thanks for looking into this and best regards,
Oliver

> 
>> ==================================================================
>> BUG: KCSAN: data-race in bcm_can_tx / bcm_tx_setup
>>
>> write to 0xffff888137fcff10 of 4 bytes by task 10792 on cpu 0:
>>   bcm_tx_setup+0x698/0xd30 net/can/bcm.c:995
>>   bcm_sendmsg+0x38b/0x470 net/can/bcm.c:1355
>>   sock_sendmsg_nosec net/socket.c:724 [inline]
>>   sock_sendmsg net/socket.c:747 [inline]
>>   ____sys_sendmsg+0x375/0x4c0 net/socket.c:2501
>>   ___sys_sendmsg net/socket.c:2555 [inline]
>>   __sys_sendmsg+0x1e3/0x270 net/socket.c:2584
>>   __do_sys_sendmsg net/socket.c:2593 [inline]
>>   __se_sys_sendmsg net/socket.c:2591 [inline]
>>   __x64_sys_sendmsg+0x46/0x50 net/socket.c:2591
>>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>   do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> write to 0xffff888137fcff10 of 4 bytes by interrupt on cpu 1:
>>   bcm_can_tx+0x38a/0x410
>>   bcm_tx_timeout_handler+0xdb/0x260
>>   __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
>>   __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749
>>   hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766
>>   __do_softirq+0xc1/0x265 kernel/softirq.c:571
>>   invoke_softirq kernel/softirq.c:445 [inline]
>>   __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650
>>   sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1107
>>   asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
>>   kcsan_setup_watchpoint+0x3fe/0x410 kernel/kcsan/core.c:696
>>   string_nocheck lib/vsprintf.c:648 [inline]
>>   string+0x16c/0x200 lib/vsprintf.c:726
>>   vsnprintf+0xa09/0xe20 lib/vsprintf.c:2796
>>   add_uevent_var+0xf0/0x1c0 lib/kobject_uevent.c:665
>>   kobject_uevent_env+0x225/0x5b0 lib/kobject_uevent.c:539
>>   kobject_uevent+0x1c/0x20 lib/kobject_uevent.c:642
>>   __loop_clr_fd+0x1e0/0x3b0 drivers/block/loop.c:1167
>>   lo_release+0xe4/0xf0 drivers/block/loop.c:1745
>>   blkdev_put+0x3fb/0x470
>>   kill_block_super+0x83/0xa0 fs/super.c:1410
>>   deactivate_locked_super+0x6b/0xd0 fs/super.c:331
>>   deactivate_super+0x9b/0xb0 fs/super.c:362
>>   cleanup_mnt+0x272/0x2e0 fs/namespace.c:1177
>>   __cleanup_mnt+0x19/0x20 fs/namespace.c:1184
>>   task_work_run+0x123/0x160 kernel/task_work.c:179
>>   resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
>>   exit_to_user_mode_loop+0xd1/0xe0 kernel/entry/common.c:171
>>   exit_to_user_mode_prepare+0x6c/0xb0 kernel/entry/common.c:204
>>   __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>>   syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
>>   do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> value changed: 0x00000059 -> 0x00000000
>>
>> Reported by Kernel Concurrency Sanitizer on:
>> CPU: 1 PID: 3096 Comm: syz-executor.5 Not tainted 6.3.0-syzkaller-00113-g1a0beef98b58 #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
>> ==================================================================
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@...glegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>
>> --
>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@...glegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000059e1b705fa2494e4%40google.com.