lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 4 Jul 2008 09:34:16 +0900
From:	KAMEZAWA Hiroyuki <>
To:	Vivek Goyal <>
Cc:	linux kernel mailing list <>,
	Libcg Devel Mailing List <>,
	Balbir Singh <>,
	Dhaval Giani <>,
	Paul Menage <>,
	Peter Zijlstra <>,
	Kazunaga Ikeno <>,
	Morton Andrew Morton <>,
	Thomas Graf <>, Rik Van Riel <>
Subject: Re: [RFC] How to handle the rules engine for cgroups

On Thu, 3 Jul 2008 11:54:46 -0400
Vivek Goyal <> wrote:

> On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:
> > On Tue, 1 Jul 2008 15:11:26 -0400
> > Vivek Goyal <> wrote:
> > > - How to handle delays in rule exectuion?
> > > 	- For example, if an "exec" happens and by the time process is moved to
> > > 	 right group, it might have forked off few more processes or might
> > > 	 have done quite some amount of memory allocation which will be
> > >    	 charged to the wring group. Or, newly exec process might get
> > >  	 killed in existing cgroup because of lack of memory (despite the
> > > 	 fact that destination cgroup has sufficient memory).
> > > 
> > Hmm, can't we rework the process event connector to use some reliable
> > fast interface besides netlink ? (I mean an interface like eventpoll.)
> > (Or enhance netlink ? ;)
> I see following text in netlink man page.
> "However, reliable transmissions from kernel to user are impossible in
>  any case. The kernel can’t send a netlink message if the socket buffer
>  is full: the message will be dropped and the kernel and  the userspace
>  process will no longer have the same view of kernel state. It is up to
>  the application to detect when this  happens  (via  the  ENOBUFS error
>  returned by recvmsg(2)) and resynchronize."
> So at the end of the day, it looks like unreliability comes from the
> fact that we can not allocate memory currently so we will discard the
> packet.
> Are there alternatives as compared to dropping packets?
If it's just problem of memory allocation, preallocate socket buffer and
use it later, like radix_tree_preload().
   foo() {
	if (preallocate())
		return -ENOBUFS;

(this means setuid() will return -ENOBUFS, undocumented error code.)

But af_netlink layer have another cause of dropping packets
 1. copying skb at broadcast.
 2. recv buffer over run..

(2) is not avoidable in the kernel. 

> - Let sender cache the packet and retry later. So maybe netlink layer
>   can return error if packet can not be queued and connector can cache the
>   event and try sending it later. (Hopefully later memory situation became
>   better because of OOM or some process exited or something else...).
>   This looks like a band-aid to handle the temporary congestion kind of
>   problems. Will not be able to help if consumer is inherently slow and
>   event generation is faster.
> This probably can be one possible enhancement to connector, but at the end
> of the day, any kind of user space daemon will have to accept the fact
> that packets can be dropped, leading to lost events. Detect that situation
> (using ENOBUFS) and then let admin know about it (logging). I am not sure
> what admin is supposed to do after that.
I'm not either ;)

> I am CCing Thomas Graf. He might have a better idea of netlink limitations
> and is there a way to overcome these.
> > 
> > Because "a child inherits parent's" rule is very strong, I think the amount
> > of events we have to check is much less than we get report. Can't we add some
> > filter/assumption here ?
> > 
> I am not sure if proc connector currently allows filtering of various
> events like fork, exec, exit etc. In a quick look it looks like it
> does not. But probably that can be worked out. Even then, it will just
> help reduce the number of messages queued for user space on that socket
> but will not take away the fact that messages can be dropped under
> memory pressure. 

> > BTW, the placement of proc_exec_connector() is not too late ? It seems memory for
> > creating exec-image is charged to original group...
> > 
> As of today it should happen because newly execed process will run into
> same cgroup as parent.  But that's what probably we need to avoid.
I think so. 

> For example, if an admin has created three cgroups "database", "browser"
> "others" and a user launches "firefox" from shell (assuming shell is running
> originally in "others" cgroup), then any memory allocation for firefox should
> come from "browser" cgroup and not from "others".

> I am assuming that this will be a requirement for enterprise class
> systems. Would be good to know the experiences of people who are already
> doing some kind of work load management.


> Thanks
> Vivek

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists