lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Nov 2017 16:45:18 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Shawn Landden <slandden@...il.com>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes
 process to death row (new syscall)

[Always cc linux-api mailing list when proposing user visible api
 changes]

On Tue 31-10-17 22:32:44, Shawn Landden wrote:
> It is common for services to be stateless around their main event loop.
> If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
> signals to the kernel that epoll_wait5() may not complete, and the kernel
> may send SIGKILL if resources get tight.
> 
> See my systemd patch: https://github.com/shawnl/systemd/tree/killme
> 
> Android uses this memory model for all programs, and having it in the
> kernel will enable integration with the page cache (not in this
> series).

I have to say I completely hate the idea. You are abusing epoll_wait5
for the out of memory handling? Why is this syscall any special from any
other one which sleeps and waits idle for an event? We do have per task
oom_score_adj for that purposes.

Besides that the patch is simply wrong because

[...]
> @@ -1029,6 +1030,22 @@ bool out_of_memory(struct oom_control *oc)
>  		return true;
>  	}
>  
> +	/*
> +	 * Check death row.
> +	 */
> +	if (!list_empty(eventpoll_deathrow_list())) {
> +		struct list_head *l = eventpoll_deathrow_list();
> +		struct task_struct *ts = list_first_entry(l,
> +					 struct task_struct, se.deathrow);
> +
> +		pr_debug("Killing pid %u from EPOLL_KILLME death row.",
> +			ts->pid);
> +
> +		/* We use SIGKILL so as to cleanly interrupt ep_poll() */
> +		kill_pid(task_pid(ts), SIGKILL, 1);
> +		return true;
> +	}
> +

this doesn't reflect the oom domain (is this memcg, mempolicy/tastset constrained
OOM). You might be killing tasks which are not in the target domain.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ