[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190507153154.GA5750@redhat.com>
Date: Tue, 7 May 2019 17:31:54 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Sultan Alsawaf <sultan@...neltoast.com>
Cc: Christian Brauner <christian@...uner.io>,
Daniel Colascione <dancol@...gle.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Tim Murray <timmurray@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>,
Martijn Coenen <maco@...roid.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
"open list:ANDROID DRIVERS" <devel@...verdev.osuosl.org>,
linux-mm <linux-mm@...ck.org>,
kernel-team <kernel-team@...roid.com>,
Andy Lutomirski <luto@...capital.net>,
"Serge E. Hallyn" <serge@...lyn.com>,
Kees Cook <keescook@...omium.org>,
Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android
I am not going to comment the intent, but to be honest I am skeptical too.
On 05/06, Sultan Alsawaf wrote:
>
> +static unsigned long find_victims(struct victim_info *varr, int *vindex,
> + int vmaxlen, int min_adj, int max_adj)
> +{
> + unsigned long pages_found = 0;
> + int old_vindex = *vindex;
> + struct task_struct *tsk;
> +
> + for_each_process(tsk) {
> + struct task_struct *vtsk;
> + unsigned long tasksize;
> + short oom_score_adj;
> +
> + /* Make sure there's space left in the victim array */
> + if (*vindex == vmaxlen)
> + break;
> +
> + /* Don't kill current, kthreads, init, or duplicates */
> + if (same_thread_group(tsk, current) ||
> + tsk->flags & PF_KTHREAD ||
> + is_global_init(tsk) ||
> + vtsk_is_duplicate(varr, *vindex, tsk))
> + continue;
> +
> + vtsk = find_lock_task_mm(tsk);
Did you test this patch with lockdep enabled?
If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
ensures that we do not take the same ->alloc_lock twice or more, but lockdep
can't know this.
> +static void scan_and_kill(unsigned long pages_needed)
> +{
> + static DECLARE_WAIT_QUEUE_HEAD(victim_waitq);
> + struct victim_info victims[MAX_VICTIMS];
> + int i, nr_to_kill = 0, nr_victims = 0;
> + unsigned long pages_found = 0;
> + atomic_t victim_count;
> +
> + /*
> + * Hold the tasklist lock so tasks don't disappear while scanning. This
> + * is preferred to holding an RCU read lock so that the list of tasks
> + * is guaranteed to be up to date. Keep preemption disabled until the
> + * SIGKILLs are sent so the victim kill process isn't interrupted.
> + */
> + read_lock(&tasklist_lock);
> + preempt_disable();
read_lock() disables preemption, every task_lock() too, so this looks
unnecessary.
> + for (i = 1; i < ARRAY_SIZE(adj_prio); i++) {
> + pages_found += find_victims(victims, &nr_victims, MAX_VICTIMS,
> + adj_prio[i], adj_prio[i - 1]);
> + if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS)
> + break;
> + }
> +
> + /*
> + * Calculate the number of tasks that need to be killed and quickly
> + * release the references to those that'll live.
> + */
> + for (i = 0, pages_found = 0; i < nr_victims; i++) {
> + struct victim_info *victim = &victims[i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + /* The victims' mm lock is taken in find_victims; release it */
> + if (pages_found >= pages_needed) {
> + task_unlock(vtsk);
> + continue;
> + }
> +
> + /*
> + * Grab a reference to the victim so it doesn't disappear after
> + * the tasklist lock is released.
> + */
> + get_task_struct(vtsk);
The comment doesn't look correct. the victim can't dissapear until task_unlock()
below, it can't pass exit_mm().
> + pages_found += victim->size;
> + nr_to_kill++;
> + }
> + read_unlock(&tasklist_lock);
> +
> + /* Kill the victims */
> + victim_count = (atomic_t)ATOMIC_INIT(nr_to_kill);
> + for (i = 0; i < nr_to_kill; i++) {
> + struct victim_info *victim = &victims[i];
> + struct task_struct *vtsk = victim->tsk;
> +
> + pr_info("Killing %s with adj %d to free %lu kiB\n", vtsk->comm,
> + vtsk->signal->oom_score_adj,
> + victim->size << (PAGE_SHIFT - 10));
> +
> + /* Configure the victim's mm to notify us when it's freed */
> + vtsk->mm->slmk_waitq = &victim_waitq;
> + vtsk->mm->slmk_counter = &victim_count;
> +
> + /* Accelerate the victim's death by forcing the kill signal */
> + do_send_sig_info(SIGKILL, SIG_INFO_TYPE, vtsk, true);
^^^^
this should be PIDTYPE_TGID
> +
> + /* Finally release the victim's mm lock */
> + task_unlock(vtsk);
> + }
> + preempt_enable_no_resched();
See above. And I don't understand how can _no_resched() really help...
> +
> + /* Try to speed up the death process now that we can schedule again */
> + for (i = 0; i < nr_to_kill; i++) {
> + struct task_struct *vtsk = victims[i].tsk;
> +
> + /* Increase the victim's priority to make it die faster */
> + set_user_nice(vtsk, MIN_NICE);
> +
> + /* Allow the victim to run on any CPU */
> + set_cpus_allowed_ptr(vtsk, cpu_all_mask);
> +
> + /* Finally release the victim reference acquired earlier */
> + put_task_struct(vtsk);
> + }
> +
> + /* Wait until all the victims die */
> + wait_event(victim_waitq, !atomic_read(&victim_count));
Can't we avoid the new slmk_waitq/slmk_counter members in mm_struct?
I mean, can't we export victim_waitq and victim_count and, say, set/test
MMF_OOM_VICTIM. In fact I think you should try to re-use mark_oom_victim()
at least.
Oleg.
Powered by blists - more mailing lists