lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 18 Nov 2020 16:13:52 -0800
From:   Suren Baghdasaryan <surenb@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <guro@...com>, Rik van Riel <riel@...riel.com>,
        Christian Brauner <christian@...uner.io>,
        Oleg Nesterov <oleg@...hat.com>,
        Tim Murray <timmurray@...gle.com>, linux-api@...r.kernel.org,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        kernel-team <kernel-team@...roid.com>,
        Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH 1/1] RFC: add pidfd_send_signal flag to reclaim mm while
 killing a process

On Wed, Nov 18, 2020 at 11:55 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Wed, Nov 18, 2020 at 11:51 AM Suren Baghdasaryan <surenb@...gle.com> wrote:
> >
> > On Wed, Nov 18, 2020 at 11:32 AM Michal Hocko <mhocko@...e.com> wrote:
> > >
> > > On Wed 18-11-20 11:22:21, Suren Baghdasaryan wrote:
> > > > On Wed, Nov 18, 2020 at 11:10 AM Michal Hocko <mhocko@...e.com> wrote:
> > > > >
> > > > > On Fri 13-11-20 18:16:32, Andrew Morton wrote:
> > > > > [...]
> > > > > > It's all sounding a bit painful (but not *too* painful).  But to
> > > > > > reiterate, I do think that adding the ability for a process to shoot
> > > > > > down a large amount of another process's memory is a lot more generally
> > > > > > useful than tying it to SIGKILL, agree?

I was looking into how to work around the limitation of MAX_RW_COUNT
and the conceptual issue there is the "struct iovec" which has its
iov_len as size_t that lacks capacity for expressing ranges like
"entire process memory". I would like to check your reaction to the
following idea which can be implemented without painful surgeries to
the import_iovec and its friends.

process_madvise(pidfd, iovec = [ { range_start_addr, 0 }, {
range_end_addr, 0 } ], vlen = 2, behavior=MADV_xxx, flags =
PMADV_FLAG_RANGE)

So, to represent a range we pass a new PMADV_FLAG_RANGE flag and
construct a 2-element vector to express range start and range end
using iovec.iov_base members. iov_len member of the iovec elements is
ignored in this mode. I know it sounds hacky but I think it's the
simplest way if we want the ability to express an arbitrarily large
range.
Another option is to do what Andrew described as "madvise((void *)0,
(void *)-1, MADV_PAGEOUT)" which means this mode works only with the
entire mm of the process.
WDYT?

> > > > >
> > > > > I am not sure TBH. Is there any reasonable usecase where uncoordinated
> > > > > memory tear down is OK and a target process which is able to see the
> > > > > unmapped memory?
> > > >
> > > > I think uncoordinated memory tear down is a special case which makes
> > > > sense only when the target process is being killed (and we can enforce
> > > > that by allowing MADV_DONTNEED to be used only if the target process
> > > > has pending SIGKILL).
> > >
> > > That would be safe but then I am wondering whether it makes sense to
> > > implement as a madvise call. It is quite strange to expect somebody call
> > > a syscall on a killed process. But this is more a detail. I am not a
> > > great fan of a more generic MADV_DONTNEED on a remote process. This is
> > > just too dangerous IMHO.
> >
> > Agree 100%
>
> I assumed here that by "a more generic MADV_DONTNEED on a remote
> process" you meant "process_madvise(MADV_DONTNEED) applied to a
> process that is not being killed". Re-reading your comment I realized
> that you might have meant "process_madvice() with generic support to
> large memory areas". I hope I understood you correctly.
>
> >
> > >
> > > > However, the ability to apply other flavors of
> > > > process_madvise() to large memory areas spanning multiple VMAs can be
> > > > useful in more cases.
> > >
> > > Yes I do agree with that. The error reporting would be more tricky but
> > > I am not really sure that the exact reporting is really necessary for
> > > advice like interface.
> >
> > Andrew's suggestion for this special mode to change return semantics
> > to the usual "0 or error code" seems to me like the most reasonable
> > way to deal with the return value limitation.
> >
> > >
> > > > For example in Android we will use
> > > > process_madvise(MADV_PAGEOUT) to "shrink" an inactive background
> > > > process.
> > >
> > > That makes sense to me.
> > > --
> > > Michal Hocko
> > > SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ