linux-kernel - Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d77dc44-f61c-4e52-938f-c268daf0e169@redhat.com>
Date: Wed, 10 Jul 2024 06:04:27 +0200
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, justinjiang@...o.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 opensource.kernel@...o.com, willy@...radead.org
Subject: Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process

On 10.07.24 06:02, Barry Song wrote:
> On Wed, Jul 10, 2024 at 3:59 PM David Hildenbrand <david@...hat.com> wrote:
>>
>> On 10.07.24 05:32, Barry Song wrote:
>>> On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>>>>
>>>> On Tue,  9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@...o.com> wrote:
>>>>
>>>>> The releasing process of the non-shared anonymous folio mapped solely by
>>>>> an exiting process may go through two flows: 1) the anonymous folio is
>>>>> firstly is swaped-out into swapspace and transformed into a swp_entry
>>>>> in shrink_folio_list; 2) then the swp_entry is released in the process
>>>>> exiting flow. This will result in the high cpu load of releasing a
>>>>> non-shared anonymous folio mapped solely by an exiting process.
>>>>>
>>>>> When the low system memory and the exiting process exist at the same
>>>>> time, it will be likely to happen, because the non-shared anonymous
>>>>> folio mapped solely by an exiting process may be reclaimed by
>>>>> shrink_folio_list.
>>>>>
>>>>> This patch is that shrink skips the non-shared anonymous folio solely
>>>>> mapped by an exting process and this folio is only released directly in
>>>>> the process exiting flow, which will save swap-out time and alleviate
>>>>> the load of the process exiting.
>>>>
>>>> It would be helpful to provide some before-and-after runtime
>>>> measurements, please.  It's a performance optimization so please let's
>>>> see what effect it has.
>>>
>>> Hi Andrew,
>>>
>>> This was something I was curious about too, so I created a small test program
>>> that allocates and continuously writes to 256MB of memory. Using QEMU, I set
>>> up a small machine with only 300MB of RAM to trigger kswapd.
>>>
>>> qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \
>>>    -smp cpus=4 -cpu max \
>>>    -m 300M -kernel arch/arm64/boot/Image
>>>
>>> The test program will be randomly terminated by its subprocess to trigger
>>> the use case of this patch.
>>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <unistd.h>
>>> #include <string.h>
>>> #include <sys/types.h>
>>> #include <sys/wait.h>
>>> #include <time.h>
>>> #include <signal.h>
>>>
>>> #define MEMORY_SIZE (256 * 1024 * 1024)
>>>
>>> unsigned char *memory;
>>>
>>> void allocate_and_write_memory()
>>> {
>>>       memory = (unsigned char *)malloc(MEMORY_SIZE);
>>>       if (memory == NULL) {
>>>           perror("malloc");
>>>           exit(EXIT_FAILURE);
>>>       }
>>>
>>>       while (1)
>>>           memset(memory, 0x11, MEMORY_SIZE);
>>> }
>>>
>>> int main()
>>> {
>>>       pid_t pid;
>>>       srand(time(NULL));
>>>
>>>       pid = fork();
>>>
>>>       if (pid < 0) {
>>>           perror("fork");
>>>           exit(EXIT_FAILURE);
>>>       }
>>>
>>>       if (pid == 0) {
>>>           int delay = (rand() % 10000) + 10000;
>>>           usleep(delay * 1000);
>>>
>>>        /* kill parent when it is busy on swapping */
>>>           kill(getppid(), SIGKILL);
>>>           _exit(0);
>>>       } else {
>>>           allocate_and_write_memory();
>>>
>>>           wait(NULL);
>>>
>>>           free(memory);
>>>       }
>>>
>>>       return 0;
>>> }
>>>
>>> I tracked the number of folios that could be redundantly
>>> swapped out by adding a simple counter as shown below:
>>>
>>> @@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio,
>>>                       check_stable_address_space(vma->vm_mm)) &&
>>>                       folio_test_swapbacked(folio) &&
>>>                       !folio_likely_mapped_shared(folio)) {
>>> +                       static long i, size;
>>> +                       size += folio_size(folio);
>>> +                       pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size);
>>>                           pra->referenced = -1;
>>>                           page_vma_mapped_walk_done(&pvmw);
>>>                           return false;
>>>
>>>
>>> This is what I have observed:
>>>
>>> / # /home/barry/develop/linux/skip_swap_out_test
>>> [   82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536
>>> [   82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072
>>> [   82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608
>>> [   82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144
>>> [   82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680
>>> [   82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216
>>> ...
>>> [   88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848
>>> [   88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384
>>> [   89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480
>>>
>>> I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB
>>> mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during
>>> the process exit.
>>>
>>> Despite the numerous mistakes Zhiguo made in sending this patch, it is still
>>> quite valuable. Please consider pulling his v9 into the mm tree for testing.
>>
>> BTW, we dropped the folio_test_anon() check, but what about shmem? They
>> also do __folio_set_swapbacked()?
> 
> my point is that the purpose is skipping redundant swap-out, if shmem is single
> mapped, they could be also skipped.

But they won't get necessarily *freed* when unmapping them. They might 
just continue living in tmpfs? where some other process might just map 
them later?

IMHO, there is a big difference here between anon and shmem. (well, 
anon_shmem would actually be different :) )

-- 
Cheers,

David / dhildenb