lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871q4m25du.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Mon, 24 Jun 2024 14:59:09 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Barry Song <21cnbao@...il.com>
Cc: Ryan Roberts <ryan.roberts@....com>,  David Hildenbrand
 <david@...hat.com>,  akpm@...ux-foundation.org,  shuah@...nel.org,
  linux-mm@...ck.org,  chrisl@...nel.org,  hughd@...gle.com,
  kaleshsingh@...gle.com,  kasong@...cent.com,
  linux-kernel@...r.kernel.org,  linux-kselftest@...r.kernel.org,  Barry
 Song <v-songbaohua@...o.com>
Subject: Re: [PATCH] selftests/mm: Introduce a test program to assess swap
 entry allocation for thp_swapout

Barry Song <21cnbao@...il.com> writes:

> On Mon, Jun 24, 2024 at 3:44 PM Huang, Ying <ying.huang@...el.com> wrote:
>>
>> Barry Song <21cnbao@...il.com> writes:
>>
>> > On Fri, Jun 21, 2024 at 9:24 PM Huang, Ying <ying.huang@...el.com> wrote:
>> >>
>> >> Barry Song <21cnbao@...il.com> writes:
>> >>
>> >> > On Fri, Jun 21, 2024 at 7:25 PM Ryan Roberts <ryan.roberts@....com> wrote:
>> >> >>
>> >> >> On 20/06/2024 12:34, David Hildenbrand wrote:
>> >> >> > On 20.06.24 11:04, Ryan Roberts wrote:
>> >> >> >> On 20/06/2024 01:26, Barry Song wrote:
>> >> >> >>> From: Barry Song <v-songbaohua@...o.com>
>> >> >> >>>
>> >> >> >>> Both Ryan and Chris have been utilizing the small test program to aid
>> >> >> >>> in debugging and identifying issues with swap entry allocation. While
>> >> >> >>> a real or intricate workload might be more suitable for assessing the
>> >> >> >>> correctness and effectiveness of the swap allocation policy, a small
>> >> >> >>> test program presents a simpler means of understanding the problem and
>> >> >> >>> initially verifying the improvements being made.
>> >> >> >>>
>> >> >> >>> Let's endeavor to integrate it into the self-test suite. Although it
>> >> >> >>> presently only accommodates 64KB and 4KB, I'm optimistic that we can
>> >> >> >>> expand its capabilities to support multiple sizes and simulate more
>> >> >> >>> complex systems in the future as required.
>> >> >> >>
>> >> >> >> I'll try to summarize the thread with Huang Ying by suggesting this test program
>> >> >> >> is "neccessary but not sufficient" to exhaustively test the mTHP swap-out path.
>> >> >> >> I've certainly found it useful and think it would be a valuable addition to the
>> >> >> >> tree.
>> >> >> >>
>> >> >> >> That said, I'm not convinced it is a selftest; IMO a selftest should provide a
>> >> >> >> clear pass/fail result against some criteria and must be able to be run
>> >> >> >> automatically by (e.g.) a CI system.
>> >> >> >
>> >> >> > Likely we should then consider moving other such performance-related thingies
>> >> >> > out of the selftests?
>> >> >>
>> >> >> Yes, that would get my vote. But of the 4 tests you mentioned that use
>> >> >> clock_gettime(), it looks like transhuge-stress is the only one that doesn't
>> >> >> have a pass/fail result, so is probably the only candidate for moving.
>> >> >>
>> >> >> The others either use the times as a timeout and determines failure if the
>> >> >> action didn't occur within the timeout (e.g. ksm_tests.c) or use it to add some
>> >> >> supplemental performance information to an otherwise functionality-oriented test.
>> >> >
>> >> > Thank you very much, Ryan. I think you've found a better home for this
>> >> > tool . I will
>> >> > send v2, relocating it to tools/mm and adding a function to swap in
>> >> > either the whole
>> >> > mTHPs or a portion of mTHPs by "-a"(aligned swapin).
>> >> >
>> >> > So basically, we will have
>> >> >
>> >> > 1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation code under
>> >> > high exercise in a short time.
>> >> >
>> >> > 2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in freeing
>> >> > memory, as well as for munmap, app exits, or OOM killer scenarios. This ensures
>> >> > new mTHP is always generated, released or swapped out, similar to the behavior
>> >> > on a PC or Android phone where many applications are frequently started and
>> >> > terminated.
>> >>
>> >> MADV_DONTNEED 64KB memory, then memset() it, this just simulates the
>> >> large folio swap-in exactly, which hasn't been merged by upstream.  I
>> >> don't think that it's a good idea to make such kind of trick.
>> >
>> > I disagree. This is how userspace heaps can manage memory
>> > deallocation.
>>
>> Sorry, I don't understand how.  Can you show some examples?  Such as
>> strace log with 64KB aligned MADV_DONTNEED?
>
> In Java heap and memory allocators such as jemalloc and Scudo, memory is freed
> using the MADV_DONTNEED flag when either free() is called or garbage collection
> occurs. In Android, the Java heap is freed in chunks aligned to 64KB
> or larger.

Originally, I heard about that MADV_FREE is used by jemalloc.  Now, I
know that they use MADV_DONTNEED too.  Thanks!

Although I still suspect that libc/java allocator will free pages in
exact 64KB size (IIUC, they should free pages in much larger trunk).  I
agree that MADV_DONTNEED is a way to create fragmentation in swap
devices.

> In
> Scudo and jemalloc, there is a configuration option to set the
> management granularity.
> This granularity is set to match the mTHP size(though the default
> value is 16KB in the
> latest Android if we don't run mTHP). Otherwise, you could end up with
> millions of
> partial unmap operations, which would severely degrade the performance of mTHP.
>
> Imagine libc/Java functioning like a slab allocator. When kfree() is
> called, some pages
> may become completely unoccupied and can be returned to the buddy allocator. In
> userspace, memory is given back to the kernel in a similar manner,
> typically using
> MADV_DONTNEED. Therefore, MADV_DONTNEED is the most common memory
> reclamation behavior in Android, coming with free(), delete() or GC.
>
> Imagine a system with extensive malloc, free, new, and delete
> operations, where objects
> are constantly being created and destroyed.
>
> On the other hand, whether libc/Java use MADV_DONTNEED to free memory is not
> crucial, although they do. We need a method to simulate the lifecycle
> of applications
> —exiting and starting anew—on PCs or Android phones. It doesn't matter if you
> use MADV_DONTNEED or munmap to achieve this.
>
> It is important to note that mTHP currently operates on a one-shot
> basis(after swap-out,
> you never get them back as mTHP as we don't support large folios
> swapin). For the test
> program, we need a method to generate new mTHPs continuously. Without this,
> after the initial iterations, we would be left with only folios,
> rendering the entire
> test program *pointless*.

I understand the requirements for new mTHPs.

>>
>> > Additionally, in the event of an application exit, munmap, or OOM killer, the
>> > amount of freed memory can be much larger than 64KB. The primary purpose
>> > of using MADV_DONTNEED is to release anonymous memory and generate
>> > new mTHP so that the iteration can continue. Otherwise, the test program
>> > becomes entirely pointless, as we only have large folios at the beginning.
>> > That is exactly why Chris has failed to find his bugs by using other small
>> > programs.
>>
>> Although I still don't understand how 64KB aligned MADV_DONTNEED is used
>> for libc/java heap or munmap in a practical way.  After more thoughts, I
>> think 64KB Aligned MADV_DONTNEED can simulate the fragmentation effect
>> of processes exit at some degree if 64KB folios in these processes are
>> swapped out without splitting.  If you have no other practical use
>> cases, I suggest to make it explicit with comments in program.
>>

[snip]

--
Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ