[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1646803679-11433-1-git-send-email-quic_charante@quicinc.com>
Date: Wed, 9 Mar 2022 10:57:59 +0530
From: Charan Teja Kalla <quic_charante@...cinc.com>
To: <akpm@...ux-foundation.org>, <yuehaibing@...wei.com>,
<minchan@...nel.org>, <sfr@...b.auug.org.au>,
<rientjes@...gle.com>, <edgararriaga@...gle.com>, <mhocko@...e.com>
CC: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
Charan Teja Kalla <quic_charante@...cinc.com>
Subject: [PATCH] mm: madvise: return correct bytes advised with process_madvise
The process_madvise() system call returns error even after processing
some VMA's passed in the 'struct iovec' vector list which leaves the
user confused to know where to restart the advise next. It is also
against this syscall man page[1] documentation where it mentions that
"return value may be less than the total number of requested bytes, if
an error occurred after some iovec elements were already processed.".
Consider a user passed 10 VMA's in the 'struct iovec' vector list of
which 9 are processed but one. Then it just returns the error caused on
that failed VMA despite the first 9 VMA's processed, leaving the user
confused about on which VMA it is failed. Returning the number of bytes
processed here can help the user to know which VMA it is failed on and
thus can retry/skip the advise on that VMA.
[1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API"
Signed-off-by: Charan Teja Kalla <quic_charante@...cinc.com>
---
mm/madvise.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 38d0f51..d3b49b3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1426,15 +1426,21 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
while (iov_iter_count(&iter)) {
iovec = iov_iter_iovec(&iter);
+ /*
+ * Even when [start, end) passed to do_madvise covers
+ * some unmapped addresses, it continues processing with
+ * returning ENOMEM at the end. Thus consider the range
+ * as processed when do_madvise() returns ENOMEM.
+ * This makes process_madvise() never returns ENOMEM.
+ */
ret = do_madvise(mm, (unsigned long)iovec.iov_base,
iovec.iov_len, behavior);
- if (ret < 0)
+ if (ret < 0 && ret != -ENOMEM)
break;
iov_iter_advance(&iter, iovec.iov_len);
}
- if (ret == 0)
- ret = total_len - iov_iter_count(&iter);
+ ret = (total_len - iov_iter_count(&iter)) ? : ret;
release_mm:
mmput(mm);
--
2.7.4
Powered by blists - more mailing lists