[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpF1JSTSRu5v8s9wG0J-S+-p57tMO+0dUF+P16_6yYV7Mg@mail.gmail.com>
Date: Thu, 5 Aug 2021 08:29:57 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <guro@...com>, Rik van Riel <riel@...riel.com>,
Minchan Kim <minchan@...nel.org>,
Christian Brauner <christian@...uner.io>,
Christoph Hellwig <hch@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
David Hildenbrand <david@...hat.com>,
Jann Horn <jannh@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Christian Brauner <christian.brauner@...ntu.com>,
Florian Weimer <fweimer@...hat.com>,
Jan Engelhardt <jengelh@...i.de>,
Tim Murray <timmurray@...gle.com>,
Linux API <linux-api@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
kernel-team <kernel-team@...roid.com>
Subject: Re: [PATCH v6 1/2] mm: introduce process_mrelease system call
On Thu, Aug 5, 2021 at 12:10 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Wed 04-08-21 11:50:03, Suren Baghdasaryan wrote:
> [...]
> > +SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> > +{
> > +#ifdef CONFIG_MMU
> > + struct mm_struct *mm = NULL;
> > + struct task_struct *task;
> > + unsigned int f_flags;
> > + struct pid *pid;
> > + long ret = 0;
> > +
> > + if (flags)
> > + return -EINVAL;
> > +
> > + pid = pidfd_get_pid(pidfd, &f_flags);
> > + if (IS_ERR(pid))
> > + return PTR_ERR(pid);
> > +
> > + task = get_pid_task(pid, PIDTYPE_PID);
> > + if (!task) {
> > + ret = -ESRCH;
> > + goto put_pid;
> > + }
> > +
> > + /*
> > + * If the task is dying and in the process of releasing its memory
> > + * then get its mm.
> > + */
> > + task = find_lock_task_mm(task);
>
> You want a different task_struct because the returned one might be
> different from the given one and you already hold a reference which you
> do not want to leak
Ah, right. I was looking at the task locking and find_lock_task_mm()
handles that but I missed the task pinning part. Will fix.
>
> > + if (!task) {
> > + ret = -ESRCH;
> > + goto put_pid;
> > + }
> > + if (task_will_free_mem(task) && (task->flags & PF_KTHREAD) == 0) {
> > + mm = task->mm;
> > + mmget(mm);
> > + }
> > + task_unlock(task);
> > + if (!mm) {
> > + ret = -EINVAL;
> > + goto put_task;
> > + }
> > +
> > + if (test_bit(MMF_OOM_SKIP, &mm->flags))
> > + goto put_mm;
>
> This is too late to check for MMF_OOM_SKIP. task_will_free_mem will fail
> with the flag being set. I believe you want something like the
> following:
>
> p = find_lock_task_mm(task);
> mm = p->mm;
>
> /* The work has been done already */
> if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
> task_unlock(p);
> goto put_task;
> }
>
> i
> if (!task_will_free_mem(p)) {
> task_unlock(p);
> goto put_task;
> }
>
> mmget(mm);
> task_unlock(p);
>
I see. Let me update the patch and will ask Andrew to remove the
previous version from mm tree.
Thanks for reviewing and pointing out the issues!
>
> > +
> > + if (mmap_read_lock_killable(mm)) {
> > + ret = -EINTR;
> > + goto put_mm;
> > + }
> > + if (!__oom_reap_task_mm(mm))
> > + ret = -EAGAIN;
> > + mmap_read_unlock(mm);
> > +
> > +put_mm:
> > + mmput(mm);
> > +put_task:
> > + put_task_struct(task);
> > +put_pid:
> > + put_pid(pid);
> > + return ret;
> > +#else
> > + return -ENOSYS;
> > +#endif /* CONFIG_MMU */
> > +}
> > --
> > 2.32.0.554.ge1b32706d8-goog
>
> --
> Michal Hocko
> SUSE Labs
Powered by blists - more mailing lists