linux-kernel - Re: [syzbot] [ext4?] general protection fault in locks_remove

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6b12ba28a4bd276ecd6ffbd97af76de46c72788a.camel@kernel.org>
Date:   Fri, 27 Oct 2023 06:16:17 -0400
From:   Jeff Layton <jlayton@...nel.org>
To:     syzbot <syzbot+ba2c35eb32f5a85137f8@...kaller.appspotmail.com>,
        adilger.kernel@...ger.ca, brauner@...nel.org,
        chuck.lever@...cle.com, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        syzkaller-bugs@...glegroups.com, tytso@....edu,
        viro@...iv.linux.org.uk
Subject: Re: [syzbot] [ext4?] general protection fault in locks_remove_posix

On Thu, 2023-10-26 at 22:05 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    2030579113a1 Add linux-next specific files for 20231020
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14e75739680000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=37404d76b3c8840e
> dashboard link: https://syzkaller.appspot.com/bug?extid=ba2c35eb32f5a85137f8
> compiler:       gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125607f5680000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12a22e93680000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/a99a981e5d78/disk-20305791.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/073a5ba6a2a6/vmlinux-20305791.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/c7c1a7107f7b/bzImage-20305791.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/81394ce5859f/mount_0.gz
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+ba2c35eb32f5a85137f8@...kaller.appspotmail.com
> 
> general protection fault, probably for non-canonical address 0xdffffc001ffff11a: 0000 [#1] PREEMPT SMP KASAN
> KASAN: probably user-memory-access in range [0x00000000ffff88d0-0x00000000ffff88d7]
> CPU: 1 PID: 5052 Comm: udevd Not tainted 6.6.0-rc6-next-20231020-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023
> RIP: 0010:list_empty include/linux/list.h:373 [inline]
> RIP: 0010:locks_remove_posix+0x100/0x510 fs/locks.c:2555
> Code: 4d 8b ae 20 02 00 00 4d 85 ed 0f 84 0c 02 00 00 e8 15 60 7d ff 49 8d 55 50 48 b9 00 00 00 00 00 fc ff df 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f 85 ae 03 00 00 49 8b 45 50 48 39 c2 0f 84 db 01 00
> RSP: 0018:ffffc90003d6f948 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff8880271cca00 RCX: dffffc0000000000
> RDX: 00000000ffff88d0 RSI: 000000001ffff11a RDI: ffff8880796982e0
> RBP: 1ffff920007adf2b R08: 0000000000000003 R09: 0000000000004000
> R10: 0000000000000000 R11: dffffc0000000000 R12: ffffc90003d6f988
> R13: 00000000ffff8880 R14: ffff8880796980c0 R15: ffff8880271ccb90
> FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fc227c3e000 CR3: 000000002000e000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  filp_flush+0x11b/0x1a0 fs/open.c:1554
>  filp_close+0x1c/0x30 fs/open.c:1563
>  close_files fs/file.c:432 [inline]
>  put_files_struct fs/file.c:447 [inline]
>  put_files_struct+0x1df/0x360 fs/file.c:444
>  exit_files+0x82/0xb0 fs/file.c:464
>  do_exit+0xa51/0x2ac0 kernel/exit.c:866
>  do_group_exit+0xd3/0x2a0 kernel/exit.c:1021
>  get_signal+0x2391/0x2760 kernel/signal.c:2904
>  arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:309
>  exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
>  exit_to_user_mode_prepare+0x11c/0x240 kernel/entry/common.c:204
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
>  syscall_exit_to_user_mode+0x1d/0x60 kernel/entry/common.c:296
>  do_syscall_64+0x4b/0x110 arch/x86/entry/common.c:88
>  entry_SYSCALL_64_after_hwframe+0x63/0x6b
> RIP: 0033:0x7fc2276be3cd
> Code: Unable to access opcode bytes at 0x7fc2276be3a3.
> RSP: 002b:00007ffd929ccc20 EFLAGS: 00000246 ORIG_RAX: 00000000000000ea
> RAX: 0000000000000000 RBX: 00007fc227b0bc80 RCX: 00007fc2276be3cd
> RDX: 0000000000000006 RSI: 00000000000013bc RDI: 00000000000013bc
> RBP: 00000000000013bc R08: 0000000000000000 R09: 0000000000000002
> R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000006
> R13: 00007ffd929cce30 R14: 0000000000001000 R15: 0000000000000000
>  </TASK>
> Modules linked in:
> ----------------
> Code disassembly (best guess):
>    0:	4d 8b ae 20 02 00 00 	mov    0x220(%r14),%r13
>    7:	4d 85 ed             	test   %r13,%r13
>    a:	0f 84 0c 02 00 00    	je     0x21c
>   10:	e8 15 60 7d ff       	call   0xff7d602a
>   15:	49 8d 55 50          	lea    0x50(%r13),%rdx
>   19:	48 b9 00 00 00 00 00 	movabs $0xdffffc0000000000,%rcx
>   20:	fc ff df
>   23:	48 89 d6             	mov    %rdx,%rsi
>   26:	48 c1 ee 03          	shr    $0x3,%rsi
> * 2a:	80 3c 0e 00          	cmpb   $0x0,(%rsi,%rcx,1) <-- trapping instruction
>   2e:	0f 85 ae 03 00 00    	jne    0x3e2
>   34:	49 8b 45 50          	mov    0x50(%r13),%rax
>   38:	48 39 c2             	cmp    %rax,%rdx
>   3b:	0f                   	.byte 0xf
>   3c:	84 db                	test   %bl,%bl
>   3e:	01 00                	add    %eax,(%rax)
> 

Hrm, this is a curious one. The relevant code is here:
                                                                                     
        ctx = locks_inode_context(inode);                              
        if (!ctx || list_empty(&ctx->flc_posix))                       
                return;                                                

So in this case, ctx was non-NULL, but apparently the i_flctx pointer
was bogus (or maybe the list in it was corrupt? Not certain here). That
pointer is initialized to NULL in inode_init_always, and it's only ever
set via cmpxchg in locks_get_lock_context.

The assembly looks really weird, but I found this mail from Linus that
explains some of what we're seeing (but in the context of a percpu var
problem):

    https://lkml.org/lkml/2023/10/8/295

I'm stumped. I don't see how this could happen right offhand, so I'm
left to wonder if maybe we have some sort of generic memory corruption
here? Could this be a KASAN bug of some sort?
-- 
Jeff Layton <jlayton@...nel.org>