lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240710033212.36497-1-21cnbao@gmail.com>
Date: Wed, 10 Jul 2024 15:32:12 +1200
From: Barry Song <21cnbao@...il.com>
To: akpm@...ux-foundation.org
Cc: baohua@...nel.org,
	david@...hat.com,
	justinjiang@...o.com,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	opensource.kernel@...o.com,
	willy@...radead.org
Subject: Re: [PATCH v7] mm: shrink skip folio mapped by an exiting process

On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Tue,  9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@...o.com> wrote:
>
> > The releasing process of the non-shared anonymous folio mapped solely by
> > an exiting process may go through two flows: 1) the anonymous folio is
> > firstly is swaped-out into swapspace and transformed into a swp_entry
> > in shrink_folio_list; 2) then the swp_entry is released in the process
> > exiting flow. This will result in the high cpu load of releasing a
> > non-shared anonymous folio mapped solely by an exiting process.
> >
> > When the low system memory and the exiting process exist at the same
> > time, it will be likely to happen, because the non-shared anonymous
> > folio mapped solely by an exiting process may be reclaimed by
> > shrink_folio_list.
> >
> > This patch is that shrink skips the non-shared anonymous folio solely
> > mapped by an exting process and this folio is only released directly in
> > the process exiting flow, which will save swap-out time and alleviate
> > the load of the process exiting.
>
> It would be helpful to provide some before-and-after runtime
> measurements, please.  It's a performance optimization so please let's
> see what effect it has.

Hi Andrew,

This was something I was curious about too, so I created a small test program
that allocates and continuously writes to 256MB of memory. Using QEMU, I set
up a small machine with only 300MB of RAM to trigger kswapd.

qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \
 -smp cpus=4 -cpu max \
 -m 300M -kernel arch/arm64/boot/Image
 
The test program will be randomly terminated by its subprocess to trigger
the use case of this patch.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <signal.h>

#define MEMORY_SIZE (256 * 1024 * 1024)

unsigned char *memory;

void allocate_and_write_memory()
{
    memory = (unsigned char *)malloc(MEMORY_SIZE);
    if (memory == NULL) {
        perror("malloc");
        exit(EXIT_FAILURE);
    }

    while (1)
        memset(memory, 0x11, MEMORY_SIZE);
}

int main()
{
    pid_t pid;
    srand(time(NULL));

    pid = fork();

    if (pid < 0) {
        perror("fork");
        exit(EXIT_FAILURE);
    }

    if (pid == 0) {
        int delay = (rand() % 10000) + 10000;
        usleep(delay * 1000);

	/* kill parent when it is busy on swapping */
        kill(getppid(), SIGKILL);
        _exit(0);
    } else {
        allocate_and_write_memory();

        wait(NULL);

        free(memory);
    }

    return 0;
}

I tracked the number of folios that could be redundantly
swapped out by adding a simple counter as shown below:

@@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio,
                    check_stable_address_space(vma->vm_mm)) &&
                    folio_test_swapbacked(folio) &&
                    !folio_likely_mapped_shared(folio)) {
+                       static long i, size;
+                       size += folio_size(folio);
+                       pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size);
                        pra->referenced = -1;
                        page_vma_mapped_walk_done(&pvmw);
                        return false;


This is what I have observed:

/ # /home/barry/develop/linux/skip_swap_out_test
[   82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536
[   82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072
[   82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608
[   82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144
[   82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680
[   82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216
...
[   88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848
[   88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384
[   89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480

I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB
mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during
the process exit.

Despite the numerous mistakes Zhiguo made in sending this patch, it is still
quite valuable. Please consider pulling his v9 into the mm tree for testing.

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ