lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190520035254.57579-8-minchan@kernel.org>
Date:   Mon, 20 May 2019 12:52:54 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tim Murray <timmurray@...gle.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Daniel Colascione <dancol@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Sonny Rao <sonnyrao@...gle.com>,
        Brian Geffon <bgeffon@...gle.com>,
        Minchan Kim <minchan@...nel.org>
Subject: [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER

System could have much faster swap device like zRAM. In that case, swapping
is extremely cheaper than file-IO on the low-end storage.
In this configuration, userspace could handle different strategy for each
kinds of vma. IOW, they want to reclaim anonymous pages by MADV_COLD
while it keeps file-backed pages in inactive LRU by MADV_COOL because
file IO is more expensive in this case so want to keep them in memory
until memory pressure happens.

To support such strategy easier, this patch introduces
MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER options in madvise(2) like
that /proc/<pid>/clear_refs already has supported same filters.
They are filters could be Ored with other existing hints using top two bits
of (int behavior).

Once either of them is set, the hint could affect only the interested vma
either anonymous or file-backed.

With that, user could call a process_madvise syscall simply with a entire
range(0x0 - 0xFFFFFFFFFFFFFFFF) but either of MADV_ANONYMOUS_FILTER and
MADV_FILE_FILTER so there is no need to call the syscall range by range.

* from v1r2
  * use consistent check with clear_refs to identify anon/file vma - surenb

* from v1r1
  * use naming "filter" for new madvise option - dancol

Signed-off-by: Minchan Kim <minchan@...nel.org>
---
 include/uapi/asm-generic/mman-common.h |  5 +++++
 mm/madvise.c                           | 14 ++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index b8e230de84a6..be59a1b90284 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -66,6 +66,11 @@
 #define MADV_WIPEONFORK 18		/* Zero memory on fork, child only */
 #define MADV_KEEPONFORK 19		/* Undo MADV_WIPEONFORK */
 
+#define MADV_BEHAVIOR_MASK (~(MADV_ANONYMOUS_FILTER|MADV_FILE_FILTER))
+
+#define MADV_ANONYMOUS_FILTER	(1<<31)	/* works for only anonymous vma */
+#define MADV_FILE_FILTER	(1<<30)	/* works for only file-backed vma */
+
 /* compatibility flags */
 #define MAP_FILE	0
 
diff --git a/mm/madvise.c b/mm/madvise.c
index f4f569dac2bd..116131243540 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1002,7 +1002,15 @@ static int madvise_core(struct task_struct *tsk, unsigned long start,
 	int write;
 	size_t len;
 	struct blk_plug plug;
+	bool anon_only, file_only;
 
+	anon_only = behavior & MADV_ANONYMOUS_FILTER;
+	file_only = behavior & MADV_FILE_FILTER;
+
+	if (anon_only && file_only)
+		return error;
+
+	behavior = behavior & MADV_BEHAVIOR_MASK;
 	if (!madvise_behavior_valid(behavior))
 		return error;
 
@@ -1067,12 +1075,18 @@ static int madvise_core(struct task_struct *tsk, unsigned long start,
 		if (end < tmp)
 			tmp = end;
 
+		if (anon_only && vma->vm_file)
+			goto next;
+		if (file_only && !vma->vm_file)
+			goto next;
+
 		/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
 		error = madvise_vma(tsk, vma, &prev, start, tmp,
 					behavior, &pages);
 		if (error)
 			goto out;
 		*nr_pages += pages;
+next:
 		start = tmp;
 		if (prev && start < prev->vm_end)
 			start = prev->vm_end;
-- 
2.21.0.1020.gf2820cf01a-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ