linux-kernel - Re: [PATCH] x86: enable RCU based table free when PARAVIRT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170823223637.bjke4w3wpolrn7md@black.fi.intel.com>
Date:   Thu, 24 Aug 2017 01:36:38 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        xen-devel <xen-devel@...ts.xenproject.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juergen Gross <jgross@...e.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Andy Lutomirski <luto@...capital.net>
Subject: Re: [PATCH] x86: enable RCU based table free when PARAVIRT

On Wed, Aug 23, 2017 at 08:27:18PM +0000, Linus Torvalds wrote:
> On Wed, Aug 23, 2017 at 12:59 PM, Kirill A. Shutemov
> <kirill@...temov.name> wrote:
> >
> > In this case we need performance numbers for !PARAVIRT kernel.
> 
> Yes.
> 
> > Numbers for tight loop of "mmap(MAP_POPULATE); munmap()" might be
> > interesting too for worst case scenario.
> 
> Actually, I don't think you want to populate all the pages. You just
> want to populate *one* page, in order to build up the page directory
> structure, not allocate all the final points.
> 
> And we only free the actual page tables when there is nothing around,
> so it should be at least a 2MB-aligned region etc.
> 
> So you should do a *big* allocation, and then touch a single page in
> the middle, and then minmap it - that should give you maximal page
> table activity. Otherwise the page tables will generally just stay
> around.
> 
> Realistically, it's mainly exit() that frees page tables. Yes, you may
> have a few page tables free'd by a normal munmap(), but it's usually
> very limited. Which is why I suggested that script-heavy thing with
> lots of small executables. That tends to be the main realistic load
> that really causes a ton of page directory activity.

Below is test cases that allocates a lot of page tables and measuare
fork/exit time. (I'm not entirely sure it's the best way to stress the
codepath.)

Unpatched:	average 4.8322s, stddev	0.114s
Patched:	average 4.8362s, stddev	0.111s

Both without PARAVIRT. Patch is modified to enable HAVE_RCU_TABLE_FREE for
!PARAVIRT too.

The test-case requires "echo 1 > /proc/sys/vm/overcommit_memory".

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <sys/wait.h>

#define PUD_SIZE (1UL << 30)
#define PMD_SIZE (1UL << 21)

#define NR_PUD 4096

#define NSEC_PER_SEC	1000000000L

int main(void)
{
	char *addr = NULL;
	unsigned long i, j;
	struct timespec start, finish;
	long long nsec;

	prctl(PR_SET_THP_DISABLE);
	for (i = 0; i < NR_PUD ; i++) {
		addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
				MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
		if (addr == MAP_FAILED) {
			perror("mmap");
			break;
		}

		for (j = 0; j < PUD_SIZE; j += PMD_SIZE)
			assert(addr[j] == 0);
	}

	for (i = 0; i < 10; i++) {
		pid_t pid;
		
		clock_gettime(CLOCK_MONOTONIC, &start);
		pid = fork();
		if (pid == -1)
			perror("fork");
		if (!pid)
			exit(0);
		wait(NULL);
		clock_gettime(CLOCK_MONOTONIC, &finish);

		nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
			(finish.tv_nsec - start.tv_nsec);
		printf("%lld\n", nsec);
	}

	return 0;
}
-- 
 Kirill A. Shutemov