lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db98cf80-6755-4083-83d7-cd750fd029b6@vivo.com>
Date: Wed, 10 Jul 2024 14:47:07 +0800
From: zhiguojiang <justinjiang@...o.com>
To: Barry Song <21cnbao@...il.com>, David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, opensource.kernel@...o.com, willy@...radead.org
Subject: Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process



在 2024/7/10 12:44, Barry Song 写道:
> [Some people who received this message don't often get email from 21cnbao@...il.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Wed, Jul 10, 2024 at 4:04 PM David Hildenbrand <david@...hat.com> wrote:
>> On 10.07.24 06:02, Barry Song wrote:
>>> On Wed, Jul 10, 2024 at 3:59 PM David Hildenbrand <david@...hat.com> wrote:
>>>> On 10.07.24 05:32, Barry Song wrote:
>>>>> On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>>>>>> On Tue,  9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@...o.com> wrote:
>>>>>>
>>>>>>> The releasing process of the non-shared anonymous folio mapped solely by
>>>>>>> an exiting process may go through two flows: 1) the anonymous folio is
>>>>>>> firstly is swaped-out into swapspace and transformed into a swp_entry
>>>>>>> in shrink_folio_list; 2) then the swp_entry is released in the process
>>>>>>> exiting flow. This will result in the high cpu load of releasing a
>>>>>>> non-shared anonymous folio mapped solely by an exiting process.
>>>>>>>
>>>>>>> When the low system memory and the exiting process exist at the same
>>>>>>> time, it will be likely to happen, because the non-shared anonymous
>>>>>>> folio mapped solely by an exiting process may be reclaimed by
>>>>>>> shrink_folio_list.
>>>>>>>
>>>>>>> This patch is that shrink skips the non-shared anonymous folio solely
>>>>>>> mapped by an exting process and this folio is only released directly in
>>>>>>> the process exiting flow, which will save swap-out time and alleviate
>>>>>>> the load of the process exiting.
>>>>>> It would be helpful to provide some before-and-after runtime
>>>>>> measurements, please.  It's a performance optimization so please let's
>>>>>> see what effect it has.
>>>>> Hi Andrew,
>>>>>
>>>>> This was something I was curious about too, so I created a small test program
>>>>> that allocates and continuously writes to 256MB of memory. Using QEMU, I set
>>>>> up a small machine with only 300MB of RAM to trigger kswapd.
>>>>>
>>>>> qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \
>>>>>     -smp cpus=4 -cpu max \
>>>>>     -m 300M -kernel arch/arm64/boot/Image
>>>>>
>>>>> The test program will be randomly terminated by its subprocess to trigger
>>>>> the use case of this patch.
>>>>>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <unistd.h>
>>>>> #include <string.h>
>>>>> #include <sys/types.h>
>>>>> #include <sys/wait.h>
>>>>> #include <time.h>
>>>>> #include <signal.h>
>>>>>
>>>>> #define MEMORY_SIZE (256 * 1024 * 1024)
>>>>>
>>>>> unsigned char *memory;
>>>>>
>>>>> void allocate_and_write_memory()
>>>>> {
>>>>>        memory = (unsigned char *)malloc(MEMORY_SIZE);
>>>>>        if (memory == NULL) {
>>>>>            perror("malloc");
>>>>>            exit(EXIT_FAILURE);
>>>>>        }
>>>>>
>>>>>        while (1)
>>>>>            memset(memory, 0x11, MEMORY_SIZE);
>>>>> }
>>>>>
>>>>> int main()
>>>>> {
>>>>>        pid_t pid;
>>>>>        srand(time(NULL));
>>>>>
>>>>>        pid = fork();
>>>>>
>>>>>        if (pid < 0) {
>>>>>            perror("fork");
>>>>>            exit(EXIT_FAILURE);
>>>>>        }
>>>>>
>>>>>        if (pid == 0) {
>>>>>            int delay = (rand() % 10000) + 10000;
>>>>>            usleep(delay * 1000);
>>>>>
>>>>>         /* kill parent when it is busy on swapping */
>>>>>            kill(getppid(), SIGKILL);
>>>>>            _exit(0);
>>>>>        } else {
>>>>>            allocate_and_write_memory();
>>>>>
>>>>>            wait(NULL);
>>>>>
>>>>>            free(memory);
>>>>>        }
>>>>>
>>>>>        return 0;
>>>>> }
>>>>>
>>>>> I tracked the number of folios that could be redundantly
>>>>> swapped out by adding a simple counter as shown below:
>>>>>
>>>>> @@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio,
>>>>>                        check_stable_address_space(vma->vm_mm)) &&
>>>>>                        folio_test_swapbacked(folio) &&
>>>>>                        !folio_likely_mapped_shared(folio)) {
>>>>> +                       static long i, size;
>>>>> +                       size += folio_size(folio);
>>>>> +                       pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size);
>>>>>                            pra->referenced = -1;
>>>>>                            page_vma_mapped_walk_done(&pvmw);
>>>>>                            return false;
>>>>>
>>>>>
>>>>> This is what I have observed:
>>>>>
>>>>> / # /home/barry/develop/linux/skip_swap_out_test
>>>>> [   82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536
>>>>> [   82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072
>>>>> [   82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608
>>>>> [   82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144
>>>>> [   82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680
>>>>> [   82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216
>>>>> ...
>>>>> [   88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848
>>>>> [   88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384
>>>>> [   89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480
>>>>>
>>>>> I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB
>>>>> mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during
>>>>> the process exit.
>>>>>
>>>>> Despite the numerous mistakes Zhiguo made in sending this patch, it is still
>>>>> quite valuable. Please consider pulling his v9 into the mm tree for testing.
>>>> BTW, we dropped the folio_test_anon() check, but what about shmem? They
>>>> also do __folio_set_swapbacked()?
>>> my point is that the purpose is skipping redundant swap-out, if shmem is single
>>> mapped, they could be also skipped.
>> But they won't get necessarily *freed* when unmapping them. They might
>> just continue living in tmpfs? where some other process might just map
>> them later?
>>
> You're correct. I overlooked this aspect, focusing on swap and thinking of shmem
> solely in terms of swap.
>
>> IMHO, there is a big difference here between anon and shmem. (well,
>> anon_shmem would actually be different :) )
> Even though anon_shmem behaves similarly to anonymous memory when
> releasing memory, it doesn't seem worth the added complexity?
>
> So unfortunately it seems Zhiguo still needs v10 to take folio_test_anon()
> back? Sorry for my bad, Zhiguo.
If folio_test_anon(folio) && folio_test_swapbacked(folio) condition is 
used, can
it means that the folio is anonymous anther than shmem definitely? So does
folio_likely_mapped_shared() need to be removed?
>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
> Thanks
> Barry
Thanks
Zhiguo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ