linux-kernel - Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190507153154.GA5750@redhat.com>
Date:   Tue, 7 May 2019 17:31:54 +0200
From:   Oleg Nesterov <oleg@...hat.com>
To:     Sultan Alsawaf <sultan@...neltoast.com>
Cc:     Christian Brauner <christian@...uner.io>,
        Daniel Colascione <dancol@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tim Murray <timmurray@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arve Hjønnevåg <arve@...roid.com>,
        Todd Kjos <tkjos@...roid.com>,
        Martijn Coenen <maco@...roid.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "open list:ANDROID DRIVERS" <devel@...verdev.osuosl.org>,
        linux-mm <linux-mm@...ck.org>,
        kernel-team <kernel-team@...roid.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Serge E. Hallyn" <serge@...lyn.com>,
        Kees Cook <keescook@...omium.org>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

I am not going to comment the intent, but to be honest I am skeptical too.

On 05/06, Sultan Alsawaf wrote:
>
> +static unsigned long find_victims(struct victim_info *varr, int *vindex,
> +				  int vmaxlen, int min_adj, int max_adj)
> +{
> +	unsigned long pages_found = 0;
> +	int old_vindex = *vindex;
> +	struct task_struct *tsk;
> +
> +	for_each_process(tsk) {
> +		struct task_struct *vtsk;
> +		unsigned long tasksize;
> +		short oom_score_adj;
> +
> +		/* Make sure there's space left in the victim array */
> +		if (*vindex == vmaxlen)
> +			break;
> +
> +		/* Don't kill current, kthreads, init, or duplicates */
> +		if (same_thread_group(tsk, current) ||
> +		    tsk->flags & PF_KTHREAD ||
> +		    is_global_init(tsk) ||
> +		    vtsk_is_duplicate(varr, *vindex, tsk))
> +			continue;
> +
> +		vtsk = find_lock_task_mm(tsk);

Did you test this patch with lockdep enabled?

If I read the patch correctly, lockdep should complain. vtsk_is_duplicate()
ensures that we do not take the same ->alloc_lock twice or more, but lockdep
can't know this.

> +static void scan_and_kill(unsigned long pages_needed)
> +{
> +	static DECLARE_WAIT_QUEUE_HEAD(victim_waitq);
> +	struct victim_info victims[MAX_VICTIMS];
> +	int i, nr_to_kill = 0, nr_victims = 0;
> +	unsigned long pages_found = 0;
> +	atomic_t victim_count;
> +
> +	/*
> +	 * Hold the tasklist lock so tasks don't disappear while scanning. This
> +	 * is preferred to holding an RCU read lock so that the list of tasks
> +	 * is guaranteed to be up to date. Keep preemption disabled until the
> +	 * SIGKILLs are sent so the victim kill process isn't interrupted.
> +	 */
> +	read_lock(&tasklist_lock);
> +	preempt_disable();

read_lock() disables preemption, every task_lock() too, so this looks
unnecessary.

> +	for (i = 1; i < ARRAY_SIZE(adj_prio); i++) {
> +		pages_found += find_victims(victims, &nr_victims, MAX_VICTIMS,
> +					    adj_prio[i], adj_prio[i - 1]);
> +		if (pages_found >= pages_needed || nr_victims == MAX_VICTIMS)
> +			break;
> +	}
> +
> +	/*
> +	 * Calculate the number of tasks that need to be killed and quickly
> +	 * release the references to those that'll live.
> +	 */
> +	for (i = 0, pages_found = 0; i < nr_victims; i++) {
> +		struct victim_info *victim = &victims[i];
> +		struct task_struct *vtsk = victim->tsk;
> +
> +		/* The victims' mm lock is taken in find_victims; release it */
> +		if (pages_found >= pages_needed) {
> +			task_unlock(vtsk);
> +			continue;
> +		}
> +
> +		/*
> +		 * Grab a reference to the victim so it doesn't disappear after
> +		 * the tasklist lock is released.
> +		 */
> +		get_task_struct(vtsk);

The comment doesn't look correct. the victim can't dissapear until task_unlock()
below, it can't pass exit_mm().

> +		pages_found += victim->size;
> +		nr_to_kill++;
> +	}
> +	read_unlock(&tasklist_lock);
> +
> +	/* Kill the victims */
> +	victim_count = (atomic_t)ATOMIC_INIT(nr_to_kill);
> +	for (i = 0; i < nr_to_kill; i++) {
> +		struct victim_info *victim = &victims[i];
> +		struct task_struct *vtsk = victim->tsk;
> +
> +		pr_info("Killing %s with adj %d to free %lu kiB\n", vtsk->comm,
> +			vtsk->signal->oom_score_adj,
> +			victim->size << (PAGE_SHIFT - 10));
> +
> +		/* Configure the victim's mm to notify us when it's freed */
> +		vtsk->mm->slmk_waitq = &victim_waitq;
> +		vtsk->mm->slmk_counter = &victim_count;
> +
> +		/* Accelerate the victim's death by forcing the kill signal */
> +		do_send_sig_info(SIGKILL, SIG_INFO_TYPE, vtsk, true);
                                                               ^^^^
this should be PIDTYPE_TGID

> +
> +		/* Finally release the victim's mm lock */
> +		task_unlock(vtsk);
> +	}
> +	preempt_enable_no_resched();

See above. And I don't understand how can _no_resched() really help...

> +
> +	/* Try to speed up the death process now that we can schedule again */
> +	for (i = 0; i < nr_to_kill; i++) {
> +		struct task_struct *vtsk = victims[i].tsk;
> +
> +		/* Increase the victim's priority to make it die faster */
> +		set_user_nice(vtsk, MIN_NICE);
> +
> +		/* Allow the victim to run on any CPU */
> +		set_cpus_allowed_ptr(vtsk, cpu_all_mask);
> +
> +		/* Finally release the victim reference acquired earlier */
> +		put_task_struct(vtsk);
> +	}
> +
> +	/* Wait until all the victims die */
> +	wait_event(victim_waitq, !atomic_read(&victim_count));

Can't we avoid the new slmk_waitq/slmk_counter members in mm_struct?

I mean, can't we export victim_waitq and victim_count and, say, set/test
MMF_OOM_VICTIM. In fact I think you should try to re-use mark_oom_victim()
at least.

Oleg.