lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 12 Feb 2021 13:05:48 -0500
From:   George Kennedy <george.kennedy@...cle.com>
To:     David Hildenbrand <david@...hat.com>,
        Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Konrad Rzeszutek Wilk <konrad@...nok.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        kasan-dev <kasan-dev@...glegroups.com>,
        Alexander Potapenko <glider@...gle.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>, pjones@...hat.com,
        konrad@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Dhaval Giani <dhaval.giani@...cle.com>
Subject: Re: [PATCH 1/1] iscsi_ibft: KASAN false positive failure occurs in
 ibft_init()



On 2/12/2021 10:36 AM, David Hildenbrand wrote:
> On 12.02.21 14:51, Dmitry Vyukov wrote:
>> On Fri, Feb 12, 2021 at 2:31 PM George Kennedy
>> <george.kennedy@...cle.com> wrote:
>>> On 2/10/2021 4:51 PM, George Kennedy wrote:
>>>> On 2/3/2021 2:35 PM, Dmitry Vyukov wrote:
>>>>> On Wed, Feb 3, 2021 at 8:29 PM Konrad Rzeszutek Wilk
>>>>> <konrad@...nok.org> wrote:
>>>>>> Hey Dmitry, Rafael, George, please see below..
>>>>>>
>>>>>> On Wed, Jan 27, 2021 at 10:10:07PM +0100, Dmitry Vyukov wrote:
>>>>>>> On Wed, Jan 27, 2021 at 9:01 PM George Kennedy
>>>>>>> <george.kennedy@...cle.com> wrote:
>>>>>>>> Hi Dmitry,
>>>>>>>>
>>>>>>>> On 1/27/2021 1:48 PM, Dmitry Vyukov wrote:
>>>>>>>>
>>>>>>>> On Wed, Jan 27, 2021 at 7:44 PM Konrad Rzeszutek Wilk
>>>>>>>> <konrad.wilk@...cle.com> wrote:
>>>>>>>>
>>>>>>>> On Tue, Jan 26, 2021 at 01:03:21PM -0500, George Kennedy wrote:
>>>>>>>>
>>>>>>>> During boot of kernel with CONFIG_KASAN the following KASAN false
>>>>>>>> positive failure will occur when ibft_init() reads the
>>>>>>>> ACPI iBFT table: BUG: KASAN: use-after-free in ibft_init
>>>>>>>>
>>>>>>>> The ACPI iBFT table is not allocated, and the iscsi driver uses
>>>>>>>> a pointer to it to calculate checksum, etc. KASAN complains
>>>>>>>> about this pointer with use-after-free, which this is not.
>>>>>>>>
>>>>>>>> Andrey, Alexander, Dmitry,
>>>>>>>>
>>>>>>>> I think this is the right way for this, but was wondering if 
>>>>>>>> you have
>>>>>>>> other suggestions?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Hi George, Konrad,
>>>>>>>>
>>>>>>>> Please provide a sample KASAN report and kernel version to match
>>>>>>>> line numbers.
>>>>>>>>
>>>>>>>> 5.4.17-2102.200.0.0.20210106_0000
>>>>>>>>
>>>>>>>> [   24.413536] iBFT detected.
>>>>>>>> [   24.414074]
>>>>>>>> ==================================================================
>>>>>>>> [   24.407342] BUG: KASAN: use-after-free in ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342] Read of size 4 at addr ffff8880be452004 by task
>>>>>>>> swapper/0/1
>>>>>>>> [   24.407342]
>>>>>>>> [   24.407342] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
>>>>>>>> 5.4.17-2102.200.0.0.20210106_0000.syzk #1
>>>>>>>> [   24.407342] Hardware name: QEMU Standard PC (i440FX + PIIX,
>>>>>>>> 1996), BIOS 0.0.0 02/06/2015
>>>>>>>> [   24.407342] Call Trace:
>>>>>>>> [   24.407342]  dump_stack+0xd4/0x119
>>>>>>>> [   24.407342]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342] print_address_description.constprop.6+0x20/0x220
>>>>>>>> [   24.407342]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342]  __kasan_report.cold.9+0x37/0x77
>>>>>>>> [   24.407342]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342]  kasan_report+0x14/0x1b
>>>>>>>> [   24.407342] __asan_report_load_n_noabort+0xf/0x11
>>>>>>>> [   24.407342]  ibft_init+0x134/0xb8b
>>>>>>>> [   24.407342]  ? dmi_sysfs_init+0x1a5/0x1a5
>>>>>>>> [   24.407342]  ? dmi_walk+0x72/0x89
>>>>>>>> [   24.407342]  ? ibft_check_initiator_for+0x159/0x159
>>>>>>>> [   24.407342]  ? rvt_init_port+0x110/0x101
>>>>>>>> [   24.407342]  ? ibft_check_initiator_for+0x159/0x159
>>>>>>>> [   24.407342]  do_one_initcall+0xc3/0x44d
>>>>>>>> [   24.407342]  ? perf_trace_initcall_level+0x410/0x405
>>>>>>>> [   24.407342]  kernel_init_freeable+0x551/0x673
>>>>>>>> [   24.407342]  ? start_kernel+0x94b/0x94b
>>>>>>>> [   24.407342]  ? __sanitizer_cov_trace_const_cmp1+0x1a/0x1c
>>>>>>>> [   24.407342]  ? __kasan_check_write+0x14/0x16
>>>>>>>> [   24.407342]  ? rest_init+0xe6/0xe6
>>>>>>>> [   24.407342]  kernel_init+0x16/0x1bd
>>>>>>>> [   24.407342]  ? rest_init+0xe6/0xe6
>>>>>>>> [   24.407342]  ret_from_fork+0x2b/0x36
>>>>>>>> [   24.407342]
>>>>>>>> [   24.407342] The buggy address belongs to the page:
>>>>>>>> [   24.407342] page:ffffea0002f91480 refcount:0 mapcount:0
>>>>>>>> mapping:0000000000000000 index:0x1
>>>>>>>> [   24.407342] flags: 0xfffffc0000000()
>>>>>>>> [   24.407342] raw: 000fffffc0000000 ffffea0002fca588
>>>>>>>> ffffea0002fb1a88 0000000000000000
>>>>>>>> [   24.407342] raw: 0000000000000001 0000000000000000
>>>>>>>> 00000000ffffffff 0000000000000000
>>>>>>>> [   24.407342] page dumped because: kasan: bad access detected
>>>>>>>> [   24.407342]
>>>>>>>> [   24.407342] Memory state around the buggy address:
>>>>>>>> [   24.407342]  ffff8880be451f00: ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ff ff ff ff ff
>>>>>>>> [   24.407342]  ffff8880be451f80: ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ff ff ff ff ff
>>>>>>>> [   24.407342] >ffff8880be452000: ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ff ff ff ff ff
>>>>>>>> [   24.407342]                    ^
>>>>>>>> [   24.407342]  ffff8880be452080: ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ff ff ff ff ff
>>>>>>>> [   24.407342]  ffff8880be452100: ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ff ff ff ff ff
>>>>>>>> [   24.407342]
>>>>>>>> ==================================================================
>>>>>>>> [   24.407342] Disabling lock debugging due to kernel taint
>>>>>>>> [   24.451021] Kernel panic - not syncing: panic_on_warn set ...
>>>>>>>> [   24.452002] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G B
>>>>>>>> 5.4.17-2102.200.0.0.20210106_0000.syzk #1
>>>>>>>> [   24.452002] Hardware name: QEMU Standard PC (i440FX + PIIX,
>>>>>>>> 1996), BIOS 0.0.0 02/06/2015
>>>>>>>> [   24.452002] Call Trace:
>>>>>>>> [   24.452002]  dump_stack+0xd4/0x119
>>>>>>>> [   24.452002]  ? ibft_init+0x102/0xb8b
>>>>>>>> [   24.452002]  panic+0x28f/0x6e0
>>>>>>>> [   24.452002]  ? __warn_printk+0xe0/0xe0
>>>>>>>> [   24.452002]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.452002]  ? add_taint+0x68/0xb3
>>>>>>>> [   24.452002]  ? add_taint+0x68/0xb3
>>>>>>>> [   24.452002]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.452002]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.452002]  end_report+0x4c/0x54
>>>>>>>> [   24.452002]  __kasan_report.cold.9+0x55/0x77
>>>>>>>> [   24.452002]  ? ibft_init+0x134/0xb8b
>>>>>>>> [   24.452002]  kasan_report+0x14/0x1b
>>>>>>>> [   24.452002] __asan_report_load_n_noabort+0xf/0x11
>>>>>>>> [   24.452002]  ibft_init+0x134/0xb8b
>>>>>>>> [   24.452002]  ? dmi_sysfs_init+0x1a5/0x1a5
>>>>>>>> [   24.452002]  ? dmi_walk+0x72/0x89
>>>>>>>> [   24.452002]  ? ibft_check_initiator_for+0x159/0x159
>>>>>>>> [   24.452002]  ? rvt_init_port+0x110/0x101
>>>>>>>> [   24.452002]  ? ibft_check_initiator_for+0x159/0x159
>>>>>>>> [   24.452002]  do_one_initcall+0xc3/0x44d
>>>>>>>> [   24.452002]  ? perf_trace_initcall_level+0x410/0x405
>>>>>>>> [   24.452002]  kernel_init_freeable+0x551/0x673
>>>>>>>> [   24.452002]  ? start_kernel+0x94b/0x94b
>>>>>>>> [   24.452002]  ? __sanitizer_cov_trace_const_cmp1+0x1a/0x1c
>>>>>>>> [   24.452002]  ? __kasan_check_write+0x14/0x16
>>>>>>>> [   24.452002]  ? rest_init+0xe6/0xe6
>>>>>>>> [   24.452002]  kernel_init+0x16/0x1bd
>>>>>>>> [   24.452002]  ? rest_init+0xe6/0xe6
>>>>>>>> [   24.452002]  ret_from_fork+0x2b/0x36
>>>>>>>> [   24.452002] Dumping ftrace buffer:
>>>>>>>> [   24.452002] ---------------------------------
>>>>>>>> [   24.452002] swapper/-1         1.... 24564337us : rdmaip_init:
>>>>>>>> 2924: rdmaip_init: Active Bonding is DISABLED
>>>>>>>> [   24.452002] ---------------------------------
>>>>>>>> [   24.452002] Kernel Offset: disabled
>>>>>>>> [   24.452002] Rebooting in 1 seconds..
>>>>>>>>
>>>>>>>> Why does KASAN think the address is freed? For that to happen that
>>>>>>>> memory should have been freed. I don't remember any similar false
>>>>>>>> positives from KASAN, so this looks a bit suspicious.
>>>>>>>>
>>>>>>>> I'm not sure why KASAN thinks the address is freed. There are
>>>>>>>> other modules where KASAN/KCOV is disabled on boot.
>>>>>>>> Could this be for a similar reason?
>>>>>>> Most of these files are disabled because they cause recursion in
>>>>>>> instrumentation, or execute too early in bootstrap process (before
>>>>>>> kasan_init).
>>>>>>>
>>>>>>> Somehow the table pointer in ibft_init points to a freed page. I
>>>>>>> tracked it down to here:
>>>>>>> https://elixir.bootlin.com/linux/v5.4.17/source/drivers/acpi/acpica/tbutils.c#L399 
>>>>>>>
>>>>>>>
>>>>>>> but I can't find where this table_desc->pointer comes from. 
>>>>>>> Perhaps it
>>>>>> It is what the BIOS generated. It usually points to some memory
>>>>>> location in right under 4GB and the BIOS stashes the DSDT, iBFT, and
>>>>>> other tables in there.
>>>>>>
>>>>>>> uses some allocation method that's not supported by KASAN? However,
>>>>>>> it's the only such case that I've seen, so it's a bit weird. 
>>>>>>> Could it
>>>>>>> use something like memblock_alloc? Or maybe that page was in fact
>>>>>>> freed?... Too bad KASAN does not print free stack for pages, maybe
>>>>>>> it's not too hard to do if CONFIG_PAGE_OWNER is enabled...
>>>>>> Hm, there is a comment in the acpi_get_table speaking about the
>>>>>> requirement of having a acpi_put_table and:
>>>>>>
>>>>>>
>>>>>>    * DESCRIPTION: Finds and verifies an ACPI table. Table must be 
>>>>>> in the
>>>>>>    *              RSDT/XSDT.
>>>>>>    *              Note that an early stage acpi_get_table() call 
>>>>>> must
>>>>>> be paired
>>>>>>    *              with an early stage acpi_put_table() call.
>>>>>> otherwise the table
>>>>>>    *              pointer mapped by the early stage mapping
>>>>>> implementation may be
>>>>>>    *              erroneously unmapped by the late stage unmapping
>>>>>> implementation
>>>>>>    *              in an acpi_put_table() invoked during the late 
>>>>>> stage.
>>>>>>    *
>>>>>>
>>>>>> Which would imply that I should use acpi_put_table in the error path
>>>>>> (see below a patch), but also copy the structure instead of 
>>>>>> depending
>>>>>> on ACPI keeping it mapped for me. I think.
>>>>> Hi Konrad,
>>>>>
>>>>> Thanks for looking into this.
>>>>> If ACPI unmaps this page, that would perfectly explain the KASAN 
>>>>> report.
>>>>>
>>>>> George, does this patch eliminate the KASAN report for you?
>>>>
>>>> Hi Dmitry,
>>>>
>>>> No luck with the patch. Tried high level bisect instead. Here are the
>>>> results:
>>>>
>>>> BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
>>>>
>>>> Bisect status:
>>>> v5.11-rc6 Sun Jan 31 13:50:09 2021 -0800     Failed
>>>> v5.11-rc1 Sun Dec 27 15:30:22 2020 -0800    Failed
>>>> v5.10 Sun Dec 13 14:41:30 2020 -0800           Failed
>>>> v5.10-rc6 Sun Nov 29 15:50:50 2020 -0800    Failed
>>>> v5.10-rc5 Sun Nov 22 15:36:08 2020 -0800    Failed
>>>> v5.10-rc4 Sun Nov 15 16:44:31 2020 -0800    Failed
>>>> v5.10-rc3 Sun Nov 8 16:10:16 2020 -0800      Failed
>>>> v5.10-rc2 Sun Nov 1 14:43:52 2020 -0800      Failed
>>>> v5.10-rc1 Sun Oct 25 15:14:11 2020 -0700     Failed
>>>> v5.9 Sun Oct 11 14:15:50 2020 -0700              OK - 10 reboots so
>>>> far w/o kasan failure
>>>>
>>>> So, will look at what changed between v5.9 and v5.10-rc1
>>>
>>> git bisect has identified the following as the offending commit:
>>>
>>> 2020-10-16 torvalds@...ux-foundation.org - 7fef431 2020-10-15 David
>>> Hildenbrand mm/page_alloc: place pages to tail in __free_pages_core()
>>
>> I think this suggests that this is a real use-after-free and the
>> page_alloc change just exposed a latent bug.
>
> That commit is ridiculously good at finding latent BUGs. Memory gets 
> allocated and used in the reverse order. So could be that this is 
> exposing yet another latent bug.
>
> For example: 
> https://lkml.kernel.org/r/20201213225517.3838501-1-linus.walleij@linaro.org
>
> Or a longer discussion at: 
> https://lkml.kernel.org/r/87361onphy.fsf_-_@codeaurora.org
>
>
> Anyhow, let me know in case it looks like this time it's actually my 
> commit that broke something (although somewhat unlikely I have to say).

Thank you Dmitry & David,

Good read and agree - great smoke test for latent bugs.

Continuing to search for the real bug.

George

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ