lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 16 Mar 2011 18:09:57 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	kosaki.motohiro@...fujitsu.com,
	David Rientjes <rientjes@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: Linux 2.6.38


[Yesterdays earthquake was announced magnitude 6.5, but M6 quake is
 no longer treated significant news in this country. We are living 
 slightly in a floating mood.]


> So we're talking about three patches:
> 
> oom-prevent-unnecessary-oom-kills-or-kernel-panics.patch
> oom-skip-zombies-when-iterating-tasklist.patch
> oom-avoid-deferring-oom-killer-if-exiting-task-is-being-traced.patch
> 
> all appended below.
> 
> About all of which Oleg had serious complaints, some of which haven't
> yet been addressed.
> 
> And that's OK.  As I said, please let's work through it and get it right.

I haven't understand what is "OK" and what do you want talk. probably
the reason is in my language skill or I haven't catch up Oleg and David
discussion. then instead, I'll post my debugging progressing condition.


 o vmscan.c#all_unreclaimable() might return false negative and lead
   to prevent oom-killer by mistaken. Why? zone->pages_scanned is not
   protected by lock, in other words, it's unstable value. in the other
   hands, x86 ZONE_DMA has only a very little memory, then usually
   never recover all_unreclaimable=no if once become all_unreclaimable=yes.
   then, if zone state become unmatched (eg pages_scanned=0 and all_unreclaimable=yes)
   it can't be recovered never. I mean I could reproduced Andrey reported issue.

 o oom_kill.c#boost_dying_task_prio() makes kernel hang-up if user
   are using cpu cgroups. because cpu cgroup has inadequate default
   RT rt_runtime_us (0 by default. 0 mean RT tasks can't run at all).

 o oom_kill.c#TIF_MEMDIE check makes kernel hang-up. I haven't catch
   the exact reason of a oom killed process sticking even though zone has
   enough memory. 

My dislikeness is, Many people in the list fun to make flamewar but 
nobody except really a few developers run the real code nor join to 
debug real and actual reported issue. In fact, Andrey made testcase and
reported his test environment and help we made reproduce envronemnt.

I also dislike some developer say they haven't seen oom livelock case yet.
It indicate they haven't tested stress workload oom scenario. How do i
trust an untested patch, an untested guys? All developer have to test
until seen oom livelock.

I know oom debugging is very painful and need to take a lot of time.
much false positive, much unfixable live lock, need mililion reset. 
But, I don't think this is good reason to take untested.

Now I'm only access a three years old PC. Therefore, I have no reason
anyone can't debug the issue.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ