linux-kernel - Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7feaada9-7cc0-cb37-83ba-4e23d8ba3ade@redhat.com>
Date:   Wed, 20 Oct 2021 18:43:35 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Oliver Sang <oliver.sang@...el.com>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Christian König <christian.koenig@....com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com
Subject: Re: [mm] 6128b3af2a: UBSAN:shift-out-of-bounds_in(null)

On 20.10.21 16:13, Oliver Sang wrote:
> Hi, David, Hi, Eric,
> 
> On Wed, Oct 20, 2021 at 09:22:52AM +0200, David Hildenbrand wrote:
>> On 19.10.21 17:49, Eric W. Biederman wrote:
>>> kernel test robot <oliver.sang@...el.com> writes:
>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with clang-14):
>>>>
>>>> commit: 6128b3af2a5e42386aa7faf37609b57f39fb7d00 ("mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> I believe this failure is misattributed.  Perhaps your reproducer
>>> only intermittently reproduces the problem?
> 
> yes, we only reproduce the problem intermittently, those 9 instances are
> out of 115 runs.
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :115          8%           9:115   dmesg.UBSAN:shift-out-of-bounds_in(null)    <--
> 
> 
>>>
>>> The change in question only contains
>>>
>>> 	flags &= ~MAP_DENYWRITE
>>>
>>> After all of the other users of MAP_DENYWRITE had been removed from the
>>> kernel.  So I don't see how it could possibly be responsible for the
>>> reported shift out of bounds problem.
>>>
>>> Eric
>>
>> Thanks for looking into this Eric while I spent the last couple of days
>> in bed feeling miserable. :)
>>
>>
>> So we get 9 new instances of "UBSAN:shift-out-of-bounds_in(null)" (NULL
>> pointer dereference) on 6128b3af2a compared to 6128b3af2a^ (8d0920bde5),
>> apparently inside ksys_mmap_pgoff() on 32bit.
>>
>> As we're dealing with a fuzzer, is there any reproducer as sometimes
>> provided by syzkaller? The report itself is not very helpful when
>> judging if that patch is actually responsible for what we're seeing.
>>
>> I agree with Eric that it's rather unlikely that when we stop masking
>> off a bit that's ignored throughout the kernel, that we suddenly trigger
>> a NULL pointer de-reference. But I learned that everything is possible ;)
> 
> 
> now we run parent 200 more times, the "UBSAN:shift-out-of-bounds_in(null)" (1)
> still cannot be reproduced on parent:
> 8d0920bde5eb8ec7 6128b3af2a5e42386aa7faf3760
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>             |             |             |
>          45:315        -11%           9:115   dmesg.BUG:kernel_NULL_pointer_dereference,address
>            :315          3%           8:115   dmesg.BUG:unable_to_handle_page_fault_for_address
>          45:315         -9%          17:115   dmesg.EIP:__ubsan_handle_shift_out_of_bounds <--(2)
>          45:315         -9%          17:115   dmesg.Kernel_panic-not_syncing:Fatal_exception
>          45:315         -9%          17:115   dmesg.Oops:#[##]
>            :315          3%           9:115   dmesg.UBSAN:shift-out-of-bounds_in(null)  <--(1)
>          45:315         -9%          17:115   dmesg.boot_failures
> 
> 
> however, from above (2), we found parent dmesg (attached) has similar
> Call Trace, which just does't have "UBSAN:shift-out-of-bounds_in(null)"
> things:
> [  272.487295][ T7295] BUG: kernel NULL pointer dereference, address: 0000000c
> [  272.488078][ T7295] #PF: supervisor read access in kernel mode
> [  272.488673][ T7295] #PF: error_code(0x0000) - not-present page
> [  272.489266][ T7295] *pde = 00000000
> [  272.489751][ T7295] Oops: 0000 [#1] SMP
> [  272.490165][ T7295] CPU: 1 PID: 7295 Comm: trinity-c2 Not tainted 5.14.0-00005-g8d0920bde5eb #1
> [  272.491122][ T7295] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [  272.492067][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [  272.492760][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d\
>  b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [  272.494890][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [  272.495686][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [  272.496532][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [  272.497383][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [  272.498152][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  272.498897][ T7295] DR6: fffe0ff0 DR7: 00000400
> [  272.499411][ T7295] Call Trace:
> [  272.499827][ T7295]  ? __lock_acquire+0x955/0xb80
> [  272.500361][ T7295]  ? rcu_lock_acquire+0x30/0x30
> [  272.500875][ T7295]  ? rcu_read_lock_sched_held+0x31/0x70
> [  272.501500][ T7295]  ksys_mmap_pgoff+0x1fd/0x290
> [  272.501990][ T7295]  __ia32_sys_mmap_pgoff+0x1c/0x30
> [  272.502512][ T7295]  do_int80_syscall_32+0x39/0x80
> [  272.503101][ T7295]  entry_INT80_32+0x10d/0x10d
> [  272.503624][ T7295] EIP: 0xb7f71a02
> [  272.504029][ T7295] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [  272.506044][ T7295] EAX: ffffffda EBX: 00000000 ECX: 00000000 EDX: f138eb71
> [  272.506825][ T7295] ESI: c5d6cf38 EDI: ffffffff EBP: 00000000 ESP: bfca54d8
> [  272.507592][ T7295] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [  272.508417][ T7295] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [  272.509201][ T7295] CR2: 000000000000000c
> [  272.509704][ T7295] ---[ end trace 97b48cc676da14f9 ]---
> [  272.510293][ T7295] EIP: __ubsan_handle_shift_out_of_bounds+0xe/0x350
> [  272.511023][ T7295] Code: 05 90 a6 c2 00 68 2a 54 00 68 2a 54 bd 4e 00 8d bd 4e 00 8d 00 00 66 90 00 00 66 90 57 56 83 ec 57 56 83 ec 89 c7 8b 48 89 c7 <8b> 48 8d b4 26 00 8d b4 26 00 75 b4 64 8b 75 b4 64 8b ca 83 bb 1c
> [  272.513169][ T7295] EAX: 00000000 EBX: c5d6cf38 ECX: 00000031 EDX: 00000000
> [  272.513979][ T7295] ESI: f138eb71 EDI: 00000000 EBP: f5a23f3c ESP: f5a23ec8
> [  272.514800][ T7295] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010292
> [  272.515974][ T7295] CR0: 80050033 CR2: 0000000c CR3: 3528d000 CR4: 000406d0
> [  272.516787][ T7295] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  272.517619][ T7295] DR6: fffe0ff0 DR7: 00000400
> [  272.518105][ T7295] Kernel panic - not syncing: Fatal exception
> [  272.519566][ T7295] Kernel Offset: disabled
> 
> 
> as contrast, in fbc:
> [  126.758570][ T3293] ================================================================================
> [  126.758949][ T3293] UBSAN: shift-out-of-bounds in (null):0:0
> [  126.759174][ T3293] BUG: kernel NULL pointer dereference, address: 00000000
> [  126.759447][ T3293] #PF: supervisor read access in kernel mode
> [  126.759676][ T3293] #PF: error_code(0x0000) - not-present page
> [  126.759905][ T3293] *pde = 00000000
> [  126.760047][ T3293] Oops: 0000 [#1] SMP
> [  126.760205][ T3293] CPU: 1 PID: 3293 Comm: trinity-c4 Not tainted 5.14.0-00006-g6128b3af2a5e #1
> [  126.760541][ T3293] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> [  126.760890][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [  126.761147][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> 83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [  126.761889][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [  126.762159][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [  126.762428][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [  126.762718][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [  126.762989][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  126.763259][ T3293] DR6: fffe0ff0 DR7: 00000400
> [  126.763436][ T3293] Call Trace:
> [  126.763562][ T3293]  ? rcu_lock_acquire+0x30/0x30
> [  126.763749][ T3293]  ? rcu_read_lock_sched_held+0x31/0x70
> [  126.763960][ T3293]  ksys_mmap_pgoff+0x1fc/0x290
> [  126.764146][ T3293]  __ia32_sys_mmap_pgoff+0x1c/0x30
> [  126.764343][ T3293]  do_int80_syscall_32+0x39/0x80
> [  126.764532][ T3293]  entry_INT80_32+0x10d/0x10d
> [  126.764709][ T3293] EIP: 0xb7fbda02
> [  126.764848][ T3293] Code: 95 01 00 05 25 36 02 00 83 ec 14 8d 80 e8 99 ff ff 50 6a 02 e8 1f ff 00 00 c7 04 24 7f 00 00 00 e8 7e 87 01 00 66 90 90 cd 80 <c3> 8d b6
> 00 00 00 00 8d bc 27 00 00 00 00 8b 1c 24 c3 8d b6 00 00
> [  126.765591][ T3293] EAX: ffffffda EBX: 00000000 ECX: 00001000 EDX: 55dd7eb6
> [  126.765859][ T3293] ESI: f0bd6374 EDI: ffffffff EBP: 00000000 ESP: bf9964d8
> [  126.766129][ T3293] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
> [  126.766419][ T3293] Modules linked in: aesni_intel crypto_simd qemu_fw_cfg autofs4
> [  126.766715][ T3293] CR2: 0000000000000000
> [  126.766894][ T3293] ---[ end trace e6000e119f0dc7f3 ]---
> [  126.767105][ T3293] EIP: __ubsan_handle_shift_out_of_bounds+0x88/0x350
> [  126.767361][ T3293] Code: 00 83 c4 04 7f 23 47 04 7f 23 47 04 ff 37 68 ef ff 37 68 ef e3 77 d0 d7 e3 77 d0 d7 00 8b 45 f0 00 8b 45 f0 c4 14 66 83 c4 14 <66> 83 66
> +83 3f 00 66 83 3f 00 00 00 66 83 00 00 66 83 b9 01 00 00
> [  126.768112][ T3293] EAX: 00000000 EBX: f345b500 ECX: 00000027 EDX: eba9ce40
> [  126.768384][ T3293] ESI: 00000046 EDI: 00000000 EBP: f3575f40 ESP: f3575ecc
> [  126.768657][ T3293] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [  126.768947][ T3293] CR0: 80050033 CR2: 00000000 CR3: 33464000 CR4: 000406d0
> [  126.769223][ T3293] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [  126.769496][ T3293] DR6: fffe0ff0 DR7: 00000400
> [  126.769680][ T3293] Kernel panic - not syncing: Fatal exception
> [  126.769946][ T3293] Kernel Offset: disabled
> 
> 
> basically, we just based on the diff to report out, but maybe need your education
> if this "UBSAN:shift-out-of-bounds_in(null)" diff really matter in this case.

The triggering code sequences are "ksys_mmap_pgoff+0x1fd/0x290" vs.
"ksys_mmap_pgoff+0x1fc/0x290", so my gut feeling is that we're dealing
with the same issue.

But I don't have an fully satisfactory explanation why we're getting
more often "address: 00000000" with that commit instead of via the
parent "address: 0000000c". Maybe simply because of changed code layout
"garbage" we're using to address is now "different garbage" :)


-- 
Thanks,

David / dhildenb