lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed,  3 Feb 2010 16:50:20 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Lubos Lunak <l.lunak@...e.cz>
Cc:	kosaki.motohiro@...fujitsu.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Nick Piggin <npiggin@...e.de>, Jiri Kosina <jkosina@...e.cz>
Subject: Re: Improving OOM killer

> =====
> --- linux-2.6.31/mm/oom_kill.c.sav      2010-02-01 22:00:41.614838540 +0100
> +++ linux-2.6.31/mm/oom_kill.c  2010-02-01 22:01:08.773757932 +0100
> @@ -69,7 +69,7 @@ unsigned long badness(struct task_struct
>         /*
>          * The memory size of the process is the basis for the badness.
>          */
> -       points = mm->total_vm;
> +       points = get_mm_rss(mm);
> 
>         /*
>          * After this unlock we can no longer dereference local variable `mm'
> @@ -83,21 +83,6 @@ unsigned long badness(struct task_struct
>                 return ULONG_MAX;
> 
>         /*
> -        * Processes which fork a lot of child processes are likely
> -        * a good choice. We add half the vmsize of the children if they
> -        * have an own mm. This prevents forking servers to flood the
> -        * machine with an endless amount of children. In case a single
> -        * child is eating the vast majority of memory, adding only half
> -        * to the parents will make the child our kill candidate of choice.
> -        */
> -       list_for_each_entry(child, &p->children, sibling) {
> -               task_lock(child);
> -               if (child->mm != mm && child->mm)
> -                       points += child->mm->total_vm/2 + 1;
> -               task_unlock(child);
> -       }
> -
> -       /*
>          * CPU time is in tens of seconds and run time is in thousands
>           * of seconds. There is no particular reason for this other than
>           * that it turned out to work very well in practice.
> =====
> 
>  In other words, use VmRSS for measuring memory usage instead of VmSize, and 
> remove child accumulating.
> 
>  I hope the above is good enough reason for the first change. VmSize includes 
> things like read-only mappings, memory mappings that is actually unused, 
> mappings backed by a file, mappings from video drivers, and so on. VmRSS is 
> actual real memory used, which is what mostly matters here. While it may not 
> be perfect, it is certainly an improvement.
> 
>  The second change should be done on the basis that it does more harm than 
> good. In this specific case, it does not help to identify the source of the 
> problem, and it incorrectly identifies kdeinit as the problem solely on the 
> basis that it spawned many other processes. I think it's already quite hinted 
> that this is a problem by the fact that you had to add a special protection 
> for init - any session manager, process launcher or even xterm used for 
> launching apps is yet another init.
> 
>  I also have problems finding a case where the child accounting would actually 
> help. I mean, in practice, I can certainly come up with something in theory, 
> and this looks to me like a solution to a very synthesized problem. In which 
> realistic case will one process launch a limited number of children, where 
> all of them will consume memory, but just killing the children one by one 
> won't avoid the problem reasonably? This is unlikely to avoid a forkbomb, as 
> in that case the number of children will be the problem. It is not necessary 
> for just one children misbehaving and being restarted, nor will it work 
> there. So what is that supposed to fix, and is it more likely than the case 
> of a process launching several unrelated children?
> 
>  If the children accounting is supposed to handle cases like forked children 
> of Apache, then I suggest it is adjusted only to count children that have 
> been forked from the parent but there has been no exec(). I'm afraid I don't 
> know how to detect that.
> 
> 
>  When running a kernel with these changes applied, I can safely do the 
> above-described case of running parallel doc generation in KDE. No clearly 
> innocent process is selected for killing, the first choice is always an 
> offender.
> 
>  Moreover, the remedy is almost instant, there is only a fraction of second of 
> when the machine is overloaded by the I/O of swapping pages in and out (I do 
> not use swap, but there is a large amount of memory used by read-only 
> mappings of binaries, libraries or various other files that is in the 
> original case rendering the machine unresponsive - I assume this is because 
> the kernel tries to kill an innocent process, but the offenders immediatelly 
> consume anything that is freed, requiring even memory used by code that is to 
> be executed to be swapped in from files again).
> 
>  I consider the patches to be definite improvements, so if they are ok, I will 
> format them as necessary. Now, what is the catch?

Personally, I think your use case represent to typical desktop and Linux
have to works fine on typical desktop use-case. /proc/pid/oom_adj never fit
desktop use-case. In past discussion, I'v agreed with much people. but I haven't
reach to agree with David Rientjes about this topic.

If you want to merge this patch, you need persuade him. I can't help you. sorry.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ