lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jan 2021 11:10:28 +0000
From:   Guillaume Tucker <guillaume.tucker@...labora.com>
To:     Mike Rapoport <rppt@...ux.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        kernelci-results-staging@...ups.io,
        "kernelci-results@...ups.io" <kernelci-results@...ups.io>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Mike Rapoport <rppt@...nel.org>, Baoquan He <bhe@...hat.com>
Subject: Re: kernelci/staging-next bisection: sleep.login on
 rk3288-rock2-square #2286-staging

On 12/01/2021 10:53, Guillaume Tucker wrote:
> On 05/01/2021 09:13, Mike Rapoport wrote:
>> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
>>> Hello Mike,
>>>
>>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
>>>> Thanks for the logs, it seems that implicitly adding reserved regions to
>>>> memblock.memory wasn't that bright idea :)
>>>
>>> Would it be possible to somehow clean up the hack then?
>>>
>>> The only difference between the clean solution and the hack is that
>>> the hack intended to achieved the exact same, but without adding the
>>> reserved regions to memblock.memory.
>>
>> I didn't consider adding reserved regions to memblock.memory as a clean
>> solution, this was still a hack, but I didn't think that things are that
>> fragile.
>>
>> I still think we cannot rely on memblock.reserved to detect
>> memory/zone/node sizes and the boot failure reported here confirms this.
>>  
>>> The comment on that problematic area says the reserved area cannot be
>>> used for DMA because of some unexplained hw issue, and that doing so
>>> prevents booting, but since the area got reserved, even with the clean
>>> solution, it shouldn't have never been used for DMA?
>>>
>>> So I can only imagine that the physical memory region is way more
>>> problematic than just for DMA. It sounds like that anything that
>>> touches it, including the CPU, will hang the system, not just DMA. It
>>> sounds somewhat similar to the other e820 direct mapping issue on x86?
>>
>> My understanding is that the boot failed because when I implicitly added
>> the reserved region to memblock.memory the memory size seen by
>> free_area_init() jumped from 2G to 4G because the reserved area was close
>> to 4G. The very first allocation would get a chunk from slightly below of
>> 4G and as there is no real memory there, the kernel would crash.
>>  
>>> If you want to test the hack on the arm board to check if it boots you
>>> can use the below commit:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
>>
>> My take is your solution would boot with this memory configuration, but I
>> still don't think that using memblock.reserved for zone/node sizing is
>> correct.
> 
> The rk3288 platform has now been failing to boot for nearly a
> month on linux-next:
> 
>   https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/
> 
> Until a fix or a new version of this patch is made, would it be
> possible to drop it or revert it so the platform become usable
> again?
> 
> Or if you want, I can make a cleaned-up version of my hack to
> ignore the problematic region if you still need your patch to be
> on linux-next, but that would probably be less than ideal.

By the way, another bisection found that this commit is also
breaking tegra124-nyan-big but only with both CONFIG_EFI=y
CONFIG_ARM_LPAE=y enabled:

  https://kernelci.org/test/case/id/5ff6b1e26cf19f3b10c94cc5/

The plain multi_v7_defconfig is booting fine:

  https://kernelci.org/test/plan/id/5ff6b0a1db91b8a2b9c94cba/

I haven't looked into this one or tried to make it boot like
rk3288, but please let me know if there's anything there that can
be done to help.

Thanks,
Guillaume

Powered by blists - more mailing lists