[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7feaada9-7cc0-cb37-83ba-4e23d8ba3ade@redhat.com>
Date: Wed, 20 Oct 2021 18:43:35 +0200
From: David Hildenbrand <david@...hat.com>
To: Oliver Sang <oliver.sang@...el.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Christian König <christian.koenig@....com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)
On 20.10.21 16:13, Oliver Sang wrote:
> Hi, David, Hi, Eric,
>
> On Wed, Oct 20, 2021 at 09:22:52AM +0200, David Hildenbrand wrote:
>> On 19.10.21 17:49, Eric W. Biederman wrote:
>>> kernel test robot <oliver.sang@...el.com> writes:
>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with clang-14):
>>>>
>>>> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> I believe this failure is misattributed. Perhaps your reproducer
>>> only intermittently reproduces the problem?
>
> yes, we only reproduce the problem intermittently, those 9 instances are
> out of 115 runs.
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :115 8% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--
>
>
>>>
>>> The change in question only contains
>>>
>>> flags &= ~MAP_DENYWRITE
>>>
>>> After all of the other users of MAP_DENYWRITE had been removed from the
>>> kernel. So I don't see how it could possibly be responsible for the
>>> reported shift out of bounds problem.
>>>
>>> Eric
>>
>> Thanks for looking into this Eric while I spent the last couple of days
>> in bed feeling miserable. :)
>>
>>
>> So we get 9 new instances of "UBSAN:shift-out-of-bounds_in(null)" (NULL
>> pointer dereference) on 6128b3af2a compared to 6128b3af2a^ (8d0920bde5),
>> apparently inside ksys_mmap_pgoff() on 32bit.
>>
>> As we're dealing with a fuzzer, is there any reproducer as sometimes
>> provided by syzkaller? The report itself is not very helpful when
>> judging if that patch is actually responsible for what we're seeing.
>>
>> I agree with Eric that it's rather unlikely that when we stop masking
>> off a bit that's ignored throughout the kernel, that we suddenly trigger
>> a NULL pointer de-reference. But I learned that everything is possible ;)
>
>
> now we run parent 200 more times, the "UBSAN:shift-out-of-bounds_in(null)" (1)
> still cannot be reproduced on parent:
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 45:315 -11% 9:115 dmesg.BUG:kernel_NULL_pointer_dereference,address
> :315 3% 8:115 dmesg.BUG:unable_to_handle_page_fault_for_address
> 45:315 -9% 17:115 dmesg.EIP:__ubsan_handle_shift_out_of_bounds <--(2)
> 45:315 -9% 17:115 dmesg.Kernel_panic-not_syncing:Fatal_exception
> 45:315 -9% 17:115 dmesg.Oops:#[##]
> :315 3% 9:115 dmesg.UBSAN:shift-out-of-bounds_in(null) <--(1)
> 45:315 -9% 17:115 dmesg.boot_failures
>
>
> however, from above (2), we found parent dmesg (attached) has similar
> Call Trace, which just does't have "UBSAN:shift-out-of-bounds_in(null)"
> things:
> [ 272.487295][ T7295] BUG: kernel NULL pointer dereference, address: 0000000c
> [ 272.488078][ T7295] #PF: supervisor read access in kernel mode
> [ 272.488673][ T7295] #PF: error_code(0x0000) - not-present page
> [ 272.489266][ T7295] *pde = 00000000
> [ 272.489751][ T7295] Oops: 0000 [#1] SMP
> [ 272.490165][ T7295] CPU: 1 PID: 7295 Comm: trinity-c2 Not tainted 5.14.0-00005-g8d0920bde5eb #1
> [ 272.491122][ T7295] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 272.492067][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [ 272.492760][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d\
> b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [ 272.494890][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [ 272.495686][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [ 272.496532][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [ 272.497383][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [ 272.498152][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 272.498897][ T7295] DR6: fffe0ff0 DR7: 00000400
> [ 272.499411][ T7295] Call Trace:
> [ 272.499827][ T7295] ? __lock_acquire+0x955/0xb80
> [ 272.500361][ T7295] ? rcu_lock_acquire+0x30/0x30
> [ 272.500875][ T7295] ? rcu_read_lock_sched_held+0x31/0x70
> [ 272.501500][ T7295] ksys_mmap_pgoff+0x1fd/0x290
> [ 272.501990][ T7295] __ia32_sys_mmap_pgoff+0x1c/0x30
> [ 272.502512][ T7295] do_int80_syscall_32+0x39/0x80
> [ 272.503101][ T7295] entry_INT80_32+0x10d/0x10d
> [ 272.503624][ T7295] EIP: 0xb7f71a02
> [ 272.504029][ T7295] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [ 272.506044][ T7295] EAX: ffffffda EBX: 00000000 ECX: 00000000 EDX: f138eb71
> [ 272.506825][ T7295] ESI: c5d6cf38 EDI: ffffffff EBP: 00000000 ESP: bfca54d8
> [ 272.507592][ T7295] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [ 272.508417][ T7295] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [ 272.509201][ T7295] CR2: 000000000000000c
> [ 272.509704][ T7295] ---[ end trace 97b48cc676da14f9 ]---
> [ 272.510293][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [ 272.511023][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [ 272.513169][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [ 272.513979][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [ 272.514800][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [ 272.515974][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [ 272.516787][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 272.517619][ T7295] DR6: fffe0ff0 DR7: 00000400
> [ 272.518105][ T7295] Kernel panic - not syncing: Fatal exception
> [ 272.519566][ T7295] Kernel Offset: disabled
>
>
> as contrast, in fbc:
> [ 126.758570][ T3293] ================================================================================
> [ 126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
> [ 126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
> [ 126.759447][ T3293] #PF: supervisor read access in kernel mode
> [ 126.759676][ T3293] #PF: error_code(0x0000) - not-present page
> [ 126.759905][ T3293] *pde = 00000000
> [ 126.760047][ T3293] Oops: 0000 [#1] SMP
> [ 126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
> [ 126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [ 126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.763436][ T3293] Call Trace:
> [ 126.763562][ T3293] ? rcu_lock_acquire+0x30/0x30
> [ 126.763749][ T3293] ? rcu_read_lock_sched_held+0x31/0x70
> [ 126.763960][ T3293] ksys_mmap_pgoff+0x1fc/0x290
> [ 126.764146][ T3293] __ia32_sys_mmap_pgoff+0x1c/0x30
> [ 126.764343][ T3293] do_int80_syscall_32+0x39/0x80
> [ 126.764532][ T3293] entry_INT80_32+0x10d/0x10d
> [ 126.764709][ T3293] EIP: 0xb7fbda02
> [ 126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
> 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [ 126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
> [ 126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
> [ 126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [ 126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [ 126.766715][ T3293] CR2: 0000000000000000
> [ 126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
> [ 126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [ 126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> +83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [ 126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [ 126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [ 126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [ 126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
> [ 126.769680][ T3293] Kernel panic - not syncing: Fatal exception
> [ 126.769946][ T3293] Kernel Offset: disabled
>
>
> basically, we just based on the diff to report out, but maybe need your education
> if this "UBSAN:shift-out-of-bounds_in(null)" diff really matter in this case.
The triggering code sequences are "ksys_mmap_pgoff+0x1fd/0x290" vs.
"ksys_mmap_pgoff+0x1fc/0x290", so my gut feeling is that we're dealing
with the same issue.
But I don't have an fully satisfactory explanation why we're getting
more often "address: 00000000" with that commit instead of via the
parent "address: 0000000c". Maybe simply because of changed code layout
"garbage" we're using to address is now "different garbage" :)
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists