linux-kernel - Re: [PATCH v2 01/13] mm: support madvise(MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <563A813B.9080903@gmail.com>
Date:	Wed, 4 Nov 2015 17:05:47 -0500
From:	Daniel Micay <danielmicay@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Minchan Kim <minchan@...nel.org>, Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Michal Hocko <mhocko@...e.cz>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Linux API <linux-api@...r.kernel.org>, Jason Evans <je@...com>,
	Shaohua Li <shli@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	yalin wang <yalin.wang2010@...il.com>,
	Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

> With enough pages at once, though, munmap would be fine, too.

That implies lots of page faults and zeroing though. The zeroing alone
is a major performance issue.

There are separate issues with munmap since it ends up resulting in a
lot more virtual memory fragmentation. It would help if the kernel used
first-best-fit for mmap instead of the current naive algorithm (bonus:
O(log n) worst-case, not O(n)). Since allocators like jemalloc and
PartitionAlloc want 2M aligned spans, mixing them with other allocators
can also accelerate the VM fragmentation caused by the dumb mmap
algorithm (i.e. they make a 2M aligned mapping, some other mmap user
does 4k, now there's a nearly 2M gap when the next 2M region is made and
the kernel keeps going rather than reusing it). Anyway, that's a totally
separate issue from this. Just felt like complaining :).

> Maybe what's really needed is a MADV_FREE variant that takes an iovec.
> On an all-cores multithreaded mm, the TLB shootdown broadcast takes
> thousands of cycles on each core more or less regardless of how much
> of the TLB gets zapped.

That would work very well. The allocator ends up having a sequence of
dirty spans that it needs to purge in one go. As long as purging is
fairly spread out, the cost of a single TLB shootdown isn't that bad. It
is extremely bad if it needs to do it over and over to purge a bunch of
ranges, which can happen if the memory has ended up being very, very
fragmentated despite the efforts to compact it (depends on what the
application ends up doing).

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)