lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 15 Jul 2011 09:28:14 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	linux-mm@...ck.org, Balbir Singh <bsingharora@...il.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] memcg: make oom_lock 0 and 1 based rather than
 coutner

On Fri 15-07-11 08:47:55, KAMEZAWA Hiroyuki wrote:
> On Thu, 14 Jul 2011 14:55:55 +0200
> Michal Hocko <mhocko@...e.cz> wrote:
> 
> > On Thu 14-07-11 20:50:12, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 14 Jul 2011 13:30:09 +0200
> > > Michal Hocko <mhocko@...e.cz> wrote:
> > [...]
> > > >  static bool mem_cgroup_oom_lock(struct mem_cgroup *mem)
> > > >  {
> > > > -	int x, lock_count = 0;
> > > > -	struct mem_cgroup *iter;
> > > > +	int x, lock_count = -1;
> > > > +	struct mem_cgroup *iter, *failed = NULL;
> > > > +	bool cond = true;
> > > >  
> > > > -	for_each_mem_cgroup_tree(iter, mem) {
> > > > -		x = atomic_inc_return(&iter->oom_lock);
> > > > -		lock_count = max(x, lock_count);
> > > > +	for_each_mem_cgroup_tree_cond(iter, mem, cond) {
> > > > +		x = !!atomic_add_unless(&iter->oom_lock, 1, 1);
> > > > +		if (lock_count == -1)
> > > > +			lock_count = x;
> > > > +		else if (lock_count != x) {
> > > > +			/*
> > > > +			 * this subtree of our hierarchy is already locked
> > > > +			 * so we cannot give a lock.
> > > > +			 */
> > > > +			lock_count = 0;
> > > > +			failed = iter;
> > > > +			cond = false;
> > > > +		}
> > > >  	}
> > > 
> > > Hm ? assuming B-C-D is locked and a new thread tries a lock on A-B-C-D-E.
> > > And for_each_mem_cgroup_tree will find groups in order of A->B->C->D->E.
> > > Before lock
> > >   A  0
> > >   B  1
> > >   C  1
> > >   D  1
> > >   E  0
> > > 
> > > After lock
> > >   A  1
> > >   B  1
> > >   C  1
> > >   D  1
> > >   E  0
> > > 
> > > here, failed = B, cond = false. Undo routine will unlock A.
> > > Hmm, seems to work in this case.
> > > 
> > > But....A's oom_lock==0 and memcg_oom_wakeup() at el will not able to
> > > know "A" is in OOM. wakeup processes in A which is waiting for oom recover..
> > 
> > Hohm, we need to have 2 different states. lock and mark_oom.
> > oom_recovert would check only the under_oom.
> > 
> 
> yes. I think so, too.
> 
> > > 
> > > Will this work ?
> > 
> > No it won't because the rest of the world has no idea that A is
> > under_oom as well.
> > 
> > > ==
> > >  # cgcreate -g memory:A
> > >  # cgset -r memory.use_hierarchy=1 A
> > >  # cgset -r memory.oom_control=1   A
> > >  # cgset -r memory.limit_in_bytes= 100M
> > >  # cgset -r memory.memsw.limit_in_bytes= 100M
> > >  # cgcreate -g memory:A/B
> > >  # cgset -r memory.oom_control=1 A/B
> > >  # cgset -r memory.limit_in_bytes=20M
> > >  # cgset -r memory.memsw.limit_in_bytes=20M
> > > 
> > >  Assume malloc XXX is a program allocating XXX Megabytes of memory.
> > > 
> > >  # cgexec -g memory:A/B malloc 30  &    #->this will be blocked by OOM of group B
> > >  # cgexec -g memory:A   malloc 80  &    #->this will be blocked by OOM of group A
> > > 
> > > 
> > > Here, 2 procs are blocked by OOM. Here, relax A's limitation and clear OOM.
> > > 
> > >  # cgset -r memory.memsw.limit_in_bytes=300M A
> > >  # cgset -r memory.limit_in_bytes=300M A
> > > 
> > >  malloc 80 will end.
> > 
> > What about yet another approach? Very similar what you proposed, I
> > guess. Again not tested and needs some cleanup just to illustrate.
> > What do you think?
> 
> Hmm, I think this will work. Please go ahead.
> Unfortunately, I'll not be able to make a quick response for a week
> because of other tasks. I'm sorry.
> 
> Anyway,
> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> 

Thanks

> 
> BTW, it's better to add "How-to-test" and the result in description.
> Some test similar to mine will show the result we want.

Agreed. I will hammer it today and repost with cleaned up description
with your example as well as mine that triggered the convoy behavior.

I hope you don't mine if I add you s-o-b to the patch as this was a
cooperative work.

> ==
> Make a hierarchy of memcg, which has 300MB memory+swap limit.
> 
>  %cgcreate -g memory:A
>  %cgset -r memory.limit_in_bytes=300M A
>  %cgset -r memory.memsw.limit_in_bytes=300M A
> 
> Then, running folloing program under A.
>  %cgexec -g memory:A ./fork
> ==
> int main(int argc, char *argv[])
> {
>         int i;
>         int status;
> 
>         for (i = 0; i < 5000; i++) {
>                 if (fork() == 0) {
>                         char *c;
>                         c = malloc(1024*1024);
>                         memset(c, 0, 1024*1024);
>                         sleep(20);
>                         fprintf(stderr, "[%d]\n", i);
>                         exit(0);
>                 }
>                 printf("%d\n", i);
>                 waitpid(-1, &status, WNOHANG);
>         }
>         while (1) {
>                 int ret;
>                 ret = waitpid(-1, &status, WNOHANG);
> 
>                 if (ret == -1)
>                         break;
>                 if (!ret)
>                         sleep(1);
>         }
>         return 0;
> }
> ==
> 
> Thank you for your effort.
> -Kame
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ