[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a0Mnp2ekmX-BX9yr+N8fy2=gBtASELLXoa9uGSpSS9aOA@mail.gmail.com>
Date: Fri, 28 Aug 2020 19:40:08 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Minchan Kim <minchan@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Christian Brauner <christian.brauner@...ntu.com>,
linux-mm <linux-mm@...ck.org>,
Linux API <linux-api@...r.kernel.org>,
Oleksandr Natalenko <oleksandr@...hat.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Tim Murray <timmurray@...gle.com>,
Sandeep Patil <sspatil@...gle.com>,
Sonny Rao <sonnyrao@...gle.com>,
Brian Geffon <bgeffon@...gle.com>,
Michal Hocko <mhocko@...e.com>,
Johannes Weiner <hannes@...xchg.org>,
Shakeel Butt <shakeelb@...gle.com>,
John Dias <joaodias@...gle.com>,
Joel Fernandes <joel@...lfernandes.org>,
Jann Horn <jannh@...gle.com>,
alexander.h.duyck@...ux.intel.com,
SeongJae Park <sj38.park@...il.com>,
David Rientjes <rientjes@...gle.com>,
Arjun Roy <arjunroy@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Christian Brauner <christian@...uner.io>,
Daniel Colascione <dancol@...gle.com>,
Jens Axboe <axboe@...nel.dk>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
SeongJae Park <sjpark@...zon.de>,
linux-man <linux-man@...r.kernel.org>
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall:
an external memory hinting API
On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim <minchan@...nel.org> wrote:
> So finally, the API is as follows,
>
> ssize_t process_madvise(int pidfd, const struct iovec *iovec,
> unsigned long vlen, int advice, unsigned int flags);
I had not followed the discussion earlier and only now came across
the syscall in linux-next, sorry for stirring things up this late.
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 94bf4958d114..8f959d90338a 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -364,6 +364,7 @@
> 440 common watch_mount sys_watch_mount
> 441 common watch_sb sys_watch_sb
> 442 common fsinfo sys_fsinfo
> +443 64 process_madvise sys_process_madvise
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> @@ -407,3 +408,4 @@
> 545 x32 execveat compat_sys_execveat
> 546 x32 preadv2 compat_sys_preadv64v2
> 547 x32 pwritev2 compat_sys_pwritev64v2
> +548 x32 process_madvise compat_sys_process_madvise
I think we should not add any new x32-specific syscalls. Instead I think
the compat_sys_process_madvise/sys_process_madvise can be
merged into one.
> + mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> + if (IS_ERR_OR_NULL(mm)) {
> + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> + goto release_task;
> + }
Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
and I would try to avoid that. Can mm_access() be changed to
itself return PTR_ERR(-ESRCH) instead of NULL to improve its
calling conventions? I see there are only three other callers.
> + ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
> + if (ret >= 0) {
> + ret = do_process_madvise(pidfd, &iter, behavior, flags);
> + kfree(iov);
> + }
> + return ret;
> +}
> +
> +#ifdef CONFIG_COMPAT
...
> +
> + ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
> + &iov, &iter);
> + if (ret >= 0) {
> + ret = do_process_madvise(pidfd, &iter, behavior, flags);
> + kfree(iov);
> + }
Every syscall that passes an iovec seems to do this. If we make import_iovec()
handle both cases directly, this syscall and a number of others can
be simplified, and you avoid the x32 entry point I mentioned above
Something like (untested)
index dad8d0cfaaf7..0de4ddff24c1 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
iovec __user * uvector,
{
ssize_t n;
struct iovec *p;
- n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
- *iov, &p);
+
+ if (in_compat_syscall())
+ n = compat_rw_copy_check_uvector(type, uvector, nr_segs,
+ fast_segs, *iov, &p);
+ else
+ n = rw_copy_check_uvector(type, uvector, nr_segs,
+ fast_segs, *iov, &p);
if (n < 0) {
if (p != *iov)
kfree(p);
Arnd
Powered by blists - more mailing lists