lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8dca5a34-2c5c-bc49-b2ad-f3e5e0fdbba3@redhat.com>
Date:   Fri, 3 Sep 2021 11:31:01 +0200
From:   David Hildenbrand <david@...hat.com>
To:     "George G. Davis" <george_davis@...tor.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        "open list:MEMORY MANAGEMENT" <linux-mm@...ck.org>,
        open list <linux-kernel@...r.kernel.org>,
        Eugeniu Rosca <erosca@...adit-jv.com>,
        "George G. Davis" <davis.george@...mens.com>
Subject: Re: [RFC][PATCH] mm/page_isolation: tracing: trace all
 test_pages_isolated failures

On 03.09.21 00:21, George G. Davis wrote:
> On Tue, Aug 31, 2021 at 04:53:31PM +0200, David Hildenbrand wrote:
>> On 23.08.21 22:28, George G. Davis wrote:
>>> From: "George G. Davis" <davis.george@...mens.com>
>>>
>>> Some test_pages_isolated failure conditions don't include trace points.
>>> For debugging issues caused by "pinned" pages, make sure to trace all
>>> calls whether they succeed or fail. In this case, a failure case did not
>>> result in a trace point. So add the missing failure case in
>>> test_pages_isolated traces.
>>
>> In which setups did you actually run into these cases?
> 
> Good question!
> 
> Although I'm not 100% certain that this specific failure condition has
> occurred in my recent testing, I'm able to reproduce cma_alloc -EBUSY
> faiure conditions when testing latest/recent master on arm64 based
> Renesas R-Car Starter Kit [1] using defconfig with
> CONFIG_CMA_SIZE_MBYTES=384 while running the following test case:

Okay, I think you are not hitting the path you touched in this patch, 
because I assume it will never ever really trigger ...

> 
> trace-cmd record -N 192.168.1.87:12345 -b 4096 -e cma -e page_isolation -e compaction -e migrate &
> sleep 10
> while true; do a=$(( ( RANDOM % 10000 ) + 1 )); echo $a > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $a; echo $a > /sys/kernel/debug/cma/cma-reserved/free); done &
> while true; do b=$(( ( RANDOM % 10000 ) + 1 )); echo $b > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $b; echo $b > /sys/kernel/debug/cma/cma-reserved/free); done &
> while true; do c=$(( ( RANDOM % 10000 ) + 1 )); echo $c > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $c; echo $c > /sys/kernel/debug/cma/cma-reserved/free); done &
> while true; do d=$(( ( RANDOM % 10000 ) + 1 )); echo $d > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $d; echo $d > /sys/kernel/debug/cma/cma-reserved/free); done &
> while true; do e=$(( ( RANDOM % 10000 ) + 1 )); echo $e > /sys/kernel/debug/cma/cma-reserved/alloc && (usleep $e; echo $e > /sys/kernel/debug/cma/cma-reserved/free); done &
> /selftests/vm/transhuge-stress &
> 
> The cma_alloc -EBUSY failures are caused by THP compound pages allocated
> from the CMA region where migration does not seem to work for compound
> THP pages. The work around is to disable CONFIG_TRANSPARENT_HUGEPAGE
> since it seems incompatible with the intended use of the CMA region.


Oh, that sounds broken, THP should not block CMA allocation or page 
migration for other purposes.

a) Are these temporary or permanent allocation errors? If they are 
permanent, they will also break memory unplug.

b) Did you reproduce on other architectures as well?

c) Did it use to work but is now broken? IOW, did you try bisecting?

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ