lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62358653-f20e-4686-a14a-7c717d6488c3@lucifer.local>
Date: Thu, 2 Jan 2025 11:34:11 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: syzbot <syzbot+46423ed8fa1f1148c6e4@...kaller.appspotmail.com>
Cc: Liam.Howlett@...cle.com, akpm@...ux-foundation.org, jannh@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com, vbabka@...e.cz
Subject: Re: [syzbot] [mm?] WARNING in vma_merge_existing_range

TL;DR:

There's not enough to go on in this report as far as I can tell. I will do
what I can writing some general tests to explore edge cases but this is
really, really odd + fact it doesn't repro is odder. And this report just
doesn't have enough data (I will work on seeing if we can provide more...).

On Thu, Jan 02, 2025 at 10:25:49AM +0000, Lorenzo Stoakes wrote:
> Happy new year!
>
> On Tue, Dec 31, 2024 at 08:50:23PM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    8379578b11d5 Merge tag 'for-v6.13-rc' of git://git.kernel...
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16113018580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d269ef41b9262400
> > dashboard link: https://syzkaller.appspot.com/bug?extid=46423ed8fa1f1148c6e4
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > userspace arch: i386
>
> Hmmmm 32-bit? But kernel reports give 64-bit registers? So I guess 32-bit
> userland, 64-bit kernel?
>
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
>
> Hmm. Racey thing?
>
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/86d2e3352aff/disk-8379578b.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/345570cd3573/vmlinux-8379578b.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/01da37a51505/bzImage-8379578b.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+46423ed8fa1f1148c6e4@...kaller.appspotmail.com
> >
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >  </TASK>
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 20504 at mm/vma.c:734 vma_merge_existing_range+0x1145/0x16f0 mm/vma.c:734
>
> It'd be nice if syzbot could actually print the code that generates the
> warning :) a nice-to-have perhaps.
>
> This is:
>
> 	VM_WARN_ON(start >= end);
>
> I suspect start == end, because start > end would be some drastic and
> god-awful bug.
>
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 20504 Comm: syz.6.5485 Not tainted 6.13.0-rc4-syzkaller-00069-g8379578b11d5 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> > RIP: 0010:vma_merge_existing_range+0x1145/0x16f0 mm/vma.c:734
> > Code: e8 20 24 0f 00 4d 2b 7d 00 4d 89 ec 48 8b 7c 24 38 e9 7f 01 00 00 e8 3a bc a8 ff 90 0f 0b 90 e9 a8 f1 ff ff e8 2c bc a8 ff 90 <0f> 0b 90 e9 0e f2 ff ff e8 1e bc a8 ff 90 0f 0b 90 4d 85 ed 0f 85
>
> Be useful to get the kernel disassembly too :)
>
> Best guess wranging a python script and objdump:
>
>    0:	e8 20 24 0f 00       	call   0xf2425
>    5:	4d 2b 7d 00          	sub    0x0(%r13),%r15
>    9:	4d 89 ec             	mov    %r13,%r12
>    c:	48 8b 7c 24 38       	mov    0x38(%rsp),%rdi
>   11:	e9 7f 01 00 00       	jmp    0x195
>   16:	e8 3a bc a8 ff       	call   0xffffffffffa8bc55
>   1b:	90                   	nop
>   1c:	0f 0b                	ud2
>   1e:	90                   	nop
>   1f:	e9 a8 f1 ff ff       	jmp    0xfffffffffffff1cc
>   24:	e8 2c bc a8 ff       	call   0xffffffffffa8bc55
>   29:	90                   	nop
>   2a:	<0f> 0b                	ud2   <-- presumably here? This is an undefined instruction...
>   2c:	90                   	nop
>   2d:	e9 0e f2 ff ff       	jmp    0xfffffffffffff240
>   32:	e8 1e bc a8 ff       	call   0xffffffffffa8bc55
>   37:	90                   	nop
>   38:	0f 0b                	ud2
>   3a:	90                   	nop
>   3b:	4d 85 ed             	test   %r13,%r13
>   3e:	0f                   	.byte 0xf
>   3f:	85                   	.byte 0x85
>
> Yeah this might be a mix of data and code somehow or just garbage? Not sure
> there's anything discernable there unfortunately.
>
> > RSP: 0018:ffffc9000ba274a0 EFLAGS: 00010293
> > RAX: ffffffff81f6b804 RBX: 0000000020c25000 RCX: ffff888060ad1e00
> > RDX: 0000000000000000 RSI: 0000000020c25000 RDI: 0000000020c25000
> > RBP: ffffc9000ba275f8 R08: ffffffff81f6aa0d R09: 00000000280000fa
> > R10: ffffc9000ba27810 R11: fffff52001744f07 R12: 0000000020c25000
> > R13: ffff888069b666c8 R14: ffffc9000ba276a0 R15: ffff888068d0b1f0
> > FS:  0000000000000000(0000) GS:ffff8880b8700000(0063) knlGS:00000000f5116b40
> > CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> > CR2: 00007fa9de2c0018 CR3: 000000006b562000 CR4: 00000000003526f0
>
> > Call Trace:
> >  <TASK>
> >  vma_modify+0x41/0x330 mm/vma.c:1514
>
> Just passes through start, end (in vmg).
>
> >  vma_modify_flags_name+0x3a6/0x430 mm/vma.c:1563
>
> Just passes through start, end.
>
> >  madvise_update_vma+0x2fe/0xc10 mm/madvise.c:159
>
> Just passes through start, end.
>
> This means it was one of MADV_NORMAL, MADV_RANDOM, MADV_DONTFORK,
> MADV_DOFORK, MADV_WIPEONFORK, MADV_KEEPONFORK, MADV_DONTDUMP, MADV_DODUMP,
> MADV_MERGEABLE, MADV_UNMERGEABLE, MADV_HUGEPAGE, MADV_NOHUGEPAGE.

Actually could also be called via... incredibly... prctl_set_vma() which invokes
madvise_set_anon_name()...

>
> Yeah we need better error handling here, because this report is just giving
> us very little to go on especially for a non-repro. Will add to TODO.
>
> >  madvise_vma_behavior mm/madvise.c:1325 [inline]
>
> Just passes through start, end.
>
> >  madvise_walk_vmas mm/madvise.c:1497 [inline]
>
> OK here we find VMAs and walk them.
>
> We explicitly check for start >= send if start < vma->vm_start.
>
> I wonder if the visit() call is splitting the VMA which confuses the logic
> here.
>
>       s  e
>       |  |
>       v  v
> |-------------|
> |             |
> |-------------|
>
> Split:
>
>       s  e
>       |  |
>       v  v
> |--------|----|
> |        |    |
> |--------|----|
>
> prev = this VMA.
>
> 	if (prev && start < prev->vm_end)
> 		start = prev->vm_end;
>
> So we end up with:
>
>
>          s,e
>          |
>          v
> |--------|----|
> |        |    |
> |--------|----|
>
> 	tmp = vma->vm_end;
> 	if (end < tmp)
> 		tmp = end;
>
> That tmp assignment will reinstate the broken end
>
> And... boom.
>
> Let me check this out and see if I can trigger it.
>
> I may be missing some safeguard that prevents this...

OK so this case wouldn't happen as we check start >= end at this point.

I will look at adding some test cases around this to see if I can figure out
broken scenarios.

But actually, if this was some structural thing like this, a repro would be
trivial.

There are cases where the mmap lock can be dropped, but none should be invoking
madvise_update_vma().

OK this is really really odd.

The fact there's not a repro suggests something is racing but we hold the mmap
lock so I really can't see how that's possible.

This report is just insufficient to go on really.

I will work on:

a. tests that explore odd scenarios in madvise_walk_vmas().
b. getting better debug data on these asserts.
c. refactoring some of this HIDEOUS madvise() code.

But for now unless we can get a repro not sure there's much we can do.

>
>
> >  do_madvise+0x1e64/0x4d10 mm/madvise.c:1684
>
> Here we explicitly check for start >= end:
>
> 	end = start + len;
> 	if (end < start)
> 		return -EINVAL;
>
> 	if (end == start)
> 		return 0;
>
> So overflow is accounted for also. But since this is a 64-bit kernel not
> really a concern.
>
> >  __do_sys_madvise mm/madvise.c:1700 [inline]
> >  __se_sys_madvise mm/madvise.c:1698 [inline]
> >  __ia32_sys_madvise+0xa6/0xc0 mm/madvise.c:1698
> >  do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
> >  __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386
> >  do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411
> >  entry_SYSENTER_compat_after_hwframe+0x84/0x8e
> > RIP: 0023:0xf7fc2579
> > Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> > RSP: 002b:00000000f511655c EFLAGS: 00000206 ORIG_RAX: 00000000000000db
> > RAX: ffffffffffffffda RBX: 0000000020c00000 RCX: 0000000000400000
> > RDX: 000000000000000e RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >  </TASK>
> > ----------------
> > Code disassembly (best guess), 2 bytes skipped:
> >    0:	10 06                	adc    %al,(%rsi)
> >    2:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
> >    6:	10 07                	adc    %al,(%rdi)
> >    8:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
> >    c:	10 08                	adc    %cl,(%rax)
> >    e:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
> >   1e:	00 51 52             	add    %dl,0x52(%rcx)
> >   21:	55                   	push   %rbp
> >   22:	89 e5                	mov    %esp,%ebp
> >   24:	0f 34                	sysenter
> >   26:	cd 80                	int    $0x80
> > * 28:	5d                   	pop    %rbp <-- trapping instruction
> >   29:	5a                   	pop    %rdx
> >   2a:	59                   	pop    %rcx
> >   2b:	c3                   	ret
> >   2c:	90                   	nop
> >   2d:	90                   	nop
> >   2e:	90                   	nop
> >   2f:	90                   	nop
> >   30:	90                   	nop
> >   31:	90                   	nop
> >   32:	90                   	nop
> >   33:	90                   	nop
> >   34:	90                   	nop
> >   35:	90                   	nop
> >   36:	90                   	nop
> >   37:	90                   	nop
> >   38:	90                   	nop
> >   39:	90                   	nop
> >   3a:	90                   	nop
> >   3b:	90                   	nop
> >   3c:	90                   	nop
> >   3d:	90                   	nop
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@...glegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ