lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Jun 2015 11:38:26 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Nikolay Borisov <kernel@...p.com>
Cc:	Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
	Marian Marinov <mm@...com>
Subject: Re: Lockup in wait_transaction_locked under memory pressure

On Mon 29-06-15 12:23:16, Nikolay Borisov wrote:
> 
> 
> On 06/29/2015 12:16 PM, Michal Hocko wrote:
> > On Mon 29-06-15 12:07:54, Nikolay Borisov wrote:
> >>
> >>
> >> On 06/29/2015 11:32 AM, Michal Hocko wrote:
> >>> On Thu 25-06-15 18:27:10, Nikolay Borisov wrote:
> >>>>
> >>>>
> >>>> On 06/25/2015 06:18 PM, Michal Hocko wrote:
> >>>>> On Thu 25-06-15 17:34:22, Nikolay Borisov wrote:
> >>>>>> On 06/25/2015 05:05 PM, Michal Hocko wrote:
> >>>>>>> On Thu 25-06-15 16:49:43, Nikolay Borisov wrote:
> >>>>>>> [...]
> >>>>>>>> How would you advise to rectify such situation?
> >>>>>>>
> >>>>>>> As I've said. Check the oom victim traces and see if it is holding any
> >>>>>>> of those locks.
> >>>>>>
> >>>>>> As mentioned previously all OOM traces are identical to the one I've
> >>>>>> sent - OOM being called form the page fault path.
> >>>>>  
> >>>>> By identical you mean that all of them kill the same task? Or just that
> >>>>> the path is same (which wouldn't be surprising as this is the only path
> >>>>> which triggers memcg oom killer)?
> >>>>
> >>>> The code path is the same, the tasks being killed are different
> >>>
> >>> Is the OOM killer triggered only for a singe memcg or others misbehave
> >>> as well?
> >>
> >> Generally OOM would be triggered for whichever memcg runs out of
> >> resources but so far I've only observed that the D state issue happens
> >> in a single containers.
> > 
> > It is not clear whether it is the OOM memcg which has tasks in the D
> > state. Anyway I think it all smells like one memcg is throttling others
> > on another shared resource - journal in your case.
> 
> Be that as it may, how do I find which cgroup is the culprit?

Ted has already described that. You have to check all the running tasks
and try to find which of them is doing the operation which blocks
others. Transaction commit sounds like the first one to check.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ