lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Aug 2020 12:04:00 -0700
From:   Minchan Kim <minchan@...nel.org>
To:     Christian Brauner <christian@...uner.io>
Cc:     Jens Axboe <axboe@...nel.dk>, Arnd Bergmann <arnd@...db.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Linux API <linux-api@...r.kernel.org>,
        Oleksandr Natalenko <oleksandr@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Tim Murray <timmurray@...gle.com>,
        Sandeep Patil <sspatil@...gle.com>,
        Sonny Rao <sonnyrao@...gle.com>,
        Brian Geffon <bgeffon@...gle.com>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Dias <joaodias@...gle.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Jann Horn <jannh@...gle.com>,
        alexander.h.duyck@...ux.intel.com,
        SeongJae Park <sj38.park@...il.com>,
        David Rientjes <rientjes@...gle.com>,
        Arjun Roy <arjunroy@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Daniel Colascione <dancol@...gle.com>,
        Kirill Tkhai <ktkhai@...tuozzo.com>,
        SeongJae Park <sjpark@...zon.de>,
        linux-man <linux-man@...r.kernel.org>
Subject: Re: [PATCH v8 3/4] mm/madvise: introduce process_madvise() syscall:
 an external memory hinting API

On Fri, Aug 28, 2020 at 08:25:34PM +0200, Christian Brauner wrote:
> On Fri, Aug 28, 2020 at 8:24 PM Jens Axboe <axboe@...nel.dk> wrote:
> >
> > On 8/28/20 11:40 AM, Arnd Bergmann wrote:
> > > On Mon, Jun 22, 2020 at 9:29 PM Minchan Kim <minchan@...nel.org> wrote:
> > >> So finally, the API is as follows,
> > >>
> > >>      ssize_t process_madvise(int pidfd, const struct iovec *iovec,
> > >>                unsigned long vlen, int advice, unsigned int flags);
> > >
> > > I had not followed the discussion earlier and only now came across
> > > the syscall in linux-next, sorry for stirring things up this late.
> > >
> > >> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > >> index 94bf4958d114..8f959d90338a 100644
> > >> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > >> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > >> @@ -364,6 +364,7 @@
> > >>  440    common  watch_mount             sys_watch_mount
> > >>  441    common  watch_sb                sys_watch_sb
> > >>  442    common  fsinfo                  sys_fsinfo
> > >> +443    64      process_madvise         sys_process_madvise
> > >>
> > >>  #
> > >>  # x32-specific system call numbers start at 512 to avoid cache impact
> > >> @@ -407,3 +408,4 @@
> > >>  545    x32     execveat                compat_sys_execveat
> > >>  546    x32     preadv2                 compat_sys_preadv64v2
> > >>  547    x32     pwritev2                compat_sys_pwritev64v2
> > >> +548    x32     process_madvise         compat_sys_process_madvise
> > >
> > > I think we should not add any new x32-specific syscalls. Instead I think
> > > the compat_sys_process_madvise/sys_process_madvise can be
> > > merged into one.
> > >
> > >> +       mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
> > >> +       if (IS_ERR_OR_NULL(mm)) {
> > >> +               ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
> > >> +               goto release_task;
> > >> +       }
> > >
> > > Minor point: Having to use IS_ERR_OR_NULL() tends to be fragile,
> > > and I would try to avoid that. Can mm_access() be changed to
> > > itself return PTR_ERR(-ESRCH) instead of NULL to improve its
> > > calling conventions? I see there are only three other callers.
> > >
> > >
> > >> +       ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
> > >> +       if (ret >= 0) {
> > >> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
> > >> +               kfree(iov);
> > >> +       }
> > >> +       return ret;
> > >> +}
> > >> +
> > >> +#ifdef CONFIG_COMPAT
> > > ...
> > >> +
> > >> +       ret = compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack),
> > >> +                               &iov, &iter);
> > >> +       if (ret >= 0) {
> > >> +               ret = do_process_madvise(pidfd, &iter, behavior, flags);
> > >> +               kfree(iov);
> > >> +       }
> > >
> > > Every syscall that passes an iovec seems to do this. If we make import_iovec()
> > > handle both cases directly, this syscall and a number of others can
> > > be simplified, and you avoid the x32 entry point I mentioned above
> > >
> > > Something like (untested)
> > >
> > > index dad8d0cfaaf7..0de4ddff24c1 100644
> > > --- a/lib/iov_iter.c
> > > +++ b/lib/iov_iter.c
> > > @@ -1683,8 +1683,13 @@ ssize_t import_iovec(int type, const struct
> > > iovec __user * uvector,
> > >  {
> > >         ssize_t n;
> > >         struct iovec *p;
> > > -       n = rw_copy_check_uvector(type, uvector, nr_segs, fast_segs,
> > > -                                 *iov, &p);
> > > +
> > > +       if (in_compat_syscall())
> 
> I suggested the exact same solutions roughly 1.5 weeks ago. :)
> Fun when I saw you mentioning this in BBB I knew exactly what you were
> referring too. :)
> 

https://lore.kernel.org/linux-man/20200816081227.ngw3l45c5uncesmr@wittgenstein/

Yes, Christian suggested the idea but mostly for only this new syscall.
I don't have the time to revise the patchset yet but may have next week.
I will follow Christian's suggestion.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ