linux-kernel - Re: [PATCH v2 01/13] mm: support madvise(MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56399CA5.8090101@gmail.com>
Date:	Wed, 4 Nov 2015 00:50:29 -0500
From:	Daniel Micay <danielmicay@...il.com>
To:	Andy Lutomirski <luto@...capital.net>,
	Minchan Kim <minchan@...nel.org>
Cc:	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Michal Hocko <mhocko@...e.cz>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Linux API <linux-api@...r.kernel.org>, Jason Evans <je@...com>,
	Shaohua Li <shli@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	yalin.wang2010@...il.com, Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

> Does this set the write protect bit?
> 
> What happens on architectures without hardware dirty tracking?

It's supposed to avoid needing page faults when the data is accessed
again, but it can just be implemented via page faults on architectures
without a way to check for access or writes. MADV_DONTNEED is also a
valid implementation of MADV_FREE if it comes to that (which is what it
does on swapless systems for now).

> Using the dirty bit for these semantics scares me.  This API creates a
> page that can have visible nonzero contents and then can
> asynchronously and magically zero itself thereafter.  That makes me
> nervous.  Could we use the accessed bit instead?  Then the observable
> semantics would be equivalent to having MADV_FREE either zero the page
> or do nothing, except that it doesn't make up its mind until the next
> read.

FWIW, those are already basically the semantics provided by GCC and LLVM
for data the compiler considers uninitialized (they could be more
aggressive since C just says it's undefined, but in practice they allow
it but can produce inconsistent results even if it isn't touched).

http://llvm.org/docs/LangRef.html#undefined-values

It doesn't seem like there would be an advantage to checking if the data
was written to vs. whether it was accessed if checking for both of those
is comparable in performance. I don't know enough about that.

>> +                       ptent = pte_mkold(ptent);
>> +                       ptent = pte_mkclean(ptent);
>> +                       set_pte_at(mm, addr, pte, ptent);
>> +                       tlb_remove_tlb_entry(tlb, pte, addr);
> 
> It looks like you are flushing the TLB.  In a multithreaded program,
> that's rather expensive.  Potentially silly question: would it be
> better to just zero the page immediately in a multithreaded program
> and then, when swapping out, check the page is zeroed and, if so, skip
> swapping it out?  That could be done without forcing an IPI.

In the common case it will be passed many pages by the allocator. There
will still be a layer of purging logic on top of MADV_FREE but it can be
much thinner than the current workarounds for MADV_DONTNEED. So the
allocator would still be coalescing dirty ranges and only purging when
the ratio of dirty:clean pages rises above some threshold. It would be
able to weight the largest ranges for purging first rather than logic
based on stuff like aging as is used for MADV_DONTNEED.

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)