lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d852c65a-f9fa-6be5-9cc2-683e0029fb03@gmail.com>
Date:   Mon, 10 Jun 2019 15:03:37 -0700
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Florian Fainelli <f.fainelli@...il.com>
Cc:     linux-kernel@...r.kernel.org,
        bcm-kernel-feedback-list@...adcom.com,
        Alexey Skidanov <alexey.skidanov@...el.com>,
        Olof Johansson <olof@...om.net>,
        Huang Shijie <sjhuang@...vatar.ai>
Subject: Re: [PATCH] lib/genalloc.c: Avoid de-referencing NULL pool

On 6/10/19 2:51 PM, Andrew Morton wrote:
> On Fri,  7 Jun 2019 16:43:31 -0700 Florian Fainelli <f.fainelli@...il.com> wrote:
> 
>> With architectures allowing the kernel to be placed almost arbitrarily
>> in memory (e.g.: ARM64), it is possible to have the kernel resides at
>> physical addresses above 4GB, resulting in neither the default CMA area,
>> nor the atomic pool from successfully allocating. This does not prevent
>> specific peripherals from working though, one example is XHCI, which
>> still operates correctly.
>>
>> Trouble comes when the XHCI driver gets suspended and resumed, since we
>> can now trigger the following NPD:
>>
>> ...
>>
>> [   13.327884] f8c0: 0000000000000030 ffffffffffffffff
>> [   13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48
>> [   13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c
>> [   13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c
>> [   13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c
>> [   13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64
>> [   13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c
>> [   13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190
>> [   13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc
>> [   13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34
>> [   13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704
>> [   13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c
>> [   13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0
>> [   13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24
>> [   13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70
>> [   13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194
>> [   13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150
>> [   13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164
>> [   13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8
>> [   13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38
>> [   13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03)
>> [   13.444033] ---[ end trace 2effe12f909ce205 ]---
>>
>> The call path leading to this problem is xhci_mem_cleanup() ->
>> dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the
>> atomic_pool is NULL, we can't possibly have the address in the atomic
>> pool anyway, so guard against that.
>>
> 
> Arguably the caller shouldn't be pasing in a NULL pointer.  Perhaps we
> couild do this as a convenience thing if addr_in_gen_pool(NULL) makes
> some sort of semantic sense, but I'm having trouble convincing myself
> that it does.

That is absolutely true, part of the problem is that there is a context
imbalance here going on, which is why this condition can be triggered.
The first time the XHCI descriptor memory is allocated, we are in
sleepable context, but when we resume from system sleep, we are not. The
allocation is checked properly against a NULL pool, but not the freeing.

The reason why I went with a change in lib/genalloc.c instead of at the
caller level under kernel/dma/remap.c is because lib/genalloc.c already
has a number of those checks against a NULL pool argument, and this one
was missing because arguably the condition is hard to hit, but can be
hit with the right conditions.

> 
> So I'm somewhat inclined to think that going oops was the appropriate
> response to this input...
> 
>> --- a/lib/genalloc.c
>> +++ b/lib/genalloc.c
>> @@ -439,6 +439,9 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
>>  	unsigned long end = start + size - 1;
>>  	struct gen_pool_chunk *chunk;
>>  
>> +	if (unlikely(!pool))
>> +		return found;
> 
> I think it would be clearer to use "return false" here, so the reader
> doesn't have to go find and out what value `found' has.

Sounds fair, the diff did not make it clear because it literally just
the line that is not included in the diff output by default :)

v2 coming, thanks for taking a look.
--
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ