lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW3PR12MB45371072D7C3FDA6986C6318F36B9@MW3PR12MB4537.namprd12.prod.outlook.com>
Date:   Tue, 16 Mar 2021 00:36:29 +0000
From:   "Liang, Liang (Leo)" <Liang.Liang@....com>
To:     David Hildenbrand <david@...hat.com>,
        Mike Rapoport <rppt@...ux.ibm.com>
CC:     "Deucher, Alexander" <Alexander.Deucher@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Huang, Ray" <Ray.Huang@....com>,
        "Koenig, Christian" <Christian.Koenig@....com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        George Kennedy <george.kennedy@...cle.com>
Subject: RE: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail
 in __free_pages_core()")

[AMD Public Use]

Hi David,

Sorry for late. If revert 7fef431be9c9 (without 7fef431be9c9), the dmesg attached. And looks the exception as below:
[  +0.027833] [0x0000000078000000 - 0x00000000783fffff] 20925 MB/s / 25405 MB/s
[  +1.363596] [0x0000000100000000 - 0x00000001003fffff] 222 MB/s / 222 MB/s
[  +1.562192] [0x0000000100400000 - 0x00000001007fffff] 222 MB/s / 222 MB/s
[  +1.881332] [0x0000000100800000 - 0x0000000100bfffff] 195 MB/s / 159 MB/s
[  +1.383388] [0x0000000100c00000 - 0x0000000100ffffff] 219 MB/s / 221 MB/s
[  +0.029342] [0x0000000101000000 - 0x00000001013fffff] 19807 MB/s / 24125 MB/s

What is the problem here? Do you want to check the acpi tables?

BRs,
Leo
-----Original Message-----
From: David Hildenbrand <david@...hat.com> 
Sent: Monday, March 15, 2021 9:04 PM
To: Mike Rapoport <rppt@...ux.ibm.com>
Cc: Liang, Liang (Leo) <Liang.Liang@....com>; Deucher, Alexander <Alexander.Deucher@....com>; linux-kernel@...r.kernel.org; amd-gfx list <amd-gfx@...ts.freedesktop.org>; Andrew Morton <akpm@...ux-foundation.org>; Huang, Ray <Ray.Huang@....com>; Koenig, Christian <Christian.Koenig@....com>; Rafael J. Wysocki <rafael@...nel.org>; George Kennedy <george.kennedy@...cle.com>
Subject: Re: slow boot with 7fef431be9c9 ("mm/page_alloc: place pages to tail in __free_pages_core()")

On 13.03.21 14:48, Mike Rapoport wrote:
> Hi,
> 
> On Sat, Mar 13, 2021 at 10:05:23AM +0100, David Hildenbrand wrote:
>>> Am 13.03.2021 um 05:04 schrieb Liang, Liang (Leo) <Liang.Liang@....com>:
>>>
>>> Hi David,
>>>
>>> Which benchmark tool you prefer? Memtest86+ or else?
>>
>> Hi Leo,
>>
>> I think you want something that runs under Linux natively.
>>
>> I'm planning on coding up a kernel module to walk all 4MB pages in 
>> the freelists and perform a stream benchmark individually. Then we 
>> might be able to identify the problematic range - if there is a 
>> problematic range :)
> 
> My wild guess would be that the pages that are now at the head of free 
> lists have wrong caching enabled. Might be worth checking in your test 
> module.

I hacked something up real quick:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdavidhildenbrand%2Fkstream&amp;data=04%7C01%7Cliang.liang%40amd.com%7C61fb103eeb7647f5228408d8e7b2d7d3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637514102622932303%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ufUYQRtdSHvEkR61LiJZtsVdYZbtdGbKlzZHOQdct78%3D&amp;reserved=0

Only briefly tested inside a VM. The output looks something like

[...]
[ 8396.432225] [0x0000000045800000 - 0x0000000045bfffff] 25322 MB/s /
38948 MB/s
[ 8396.448749] [0x0000000045c00000 - 0x0000000045ffffff] 24481 MB/s /
38946 MB/s
[ 8396.465197] [0x0000000046000000 - 0x00000000463fffff] 24892 MB/s /
39170 MB/s
[ 8396.481552] [0x0000000046400000 - 0x00000000467fffff] 25222 MB/s /
39156 MB/s
[ 8396.498012] [0x0000000046800000 - 0x0000000046bfffff] 24416 MB/s /
39159 MB/s
[ 8396.514397] [0x0000000046c00000 - 0x0000000046ffffff] 25469 MB/s /
38940 MB/s
[ 8396.530849] [0x0000000047000000 - 0x00000000473fffff] 24885 MB/s /
38734 MB/s
[ 8396.547195] [0x0000000047400000 - 0x00000000477fffff] 25458 MB/s /
38941 MB/s
[...]

The benchmark allocates one 4 MiB chunk at a time and runs a simplified STREAM benchmark a) without flushing caches b) flushing caches before every memory access.

It would be great if you could run that with the *old behavior* kernel (IOW, without 7fef431be9c9), so we might still be lucky to catch the problematic area in the freelist.

Let's see if that will indicate anything.

--
Thanks,

David / dhildenb

Download attachment "kstream_revert_7fef431be9c9.log" of type "application/octet-stream" (277831 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ