linux-ext4 - Re: Lockup in wait_transaction_locked under memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55910E84.3000106@kyup.com>
Date:	Mon, 29 Jun 2015 12:23:16 +0300
From:	Nikolay Borisov <kernel@...p.com>
To:	Michal Hocko <mhocko@...e.cz>
CC:	Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
	Marian Marinov <mm@...com>
Subject: Re: Lockup in wait_transaction_locked under memory pressure



On 06/29/2015 12:16 PM, Michal Hocko wrote:
> On Mon 29-06-15 12:07:54, Nikolay Borisov wrote:
>>
>>
>> On 06/29/2015 11:32 AM, Michal Hocko wrote:
>>> On Thu 25-06-15 18:27:10, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 06/25/2015 06:18 PM, Michal Hocko wrote:
>>>>> On Thu 25-06-15 17:34:22, Nikolay Borisov wrote:
>>>>>> On 06/25/2015 05:05 PM, Michal Hocko wrote:
>>>>>>> On Thu 25-06-15 16:49:43, Nikolay Borisov wrote:
>>>>>>> [...]
>>>>>>>> How would you advise to rectify such situation?
>>>>>>>
>>>>>>> As I've said. Check the oom victim traces and see if it is holding any
>>>>>>> of those locks.
>>>>>>
>>>>>> As mentioned previously all OOM traces are identical to the one I've
>>>>>> sent - OOM being called form the page fault path.
>>>>>  
>>>>> By identical you mean that all of them kill the same task? Or just that
>>>>> the path is same (which wouldn't be surprising as this is the only path
>>>>> which triggers memcg oom killer)?
>>>>
>>>> The code path is the same, the tasks being killed are different
>>>
>>> Is the OOM killer triggered only for a singe memcg or others misbehave
>>> as well?
>>
>> Generally OOM would be triggered for whichever memcg runs out of
>> resources but so far I've only observed that the D state issue happens
>> in a single containers.
> 
> It is not clear whether it is the OOM memcg which has tasks in the D
> state. Anyway I think it all smells like one memcg is throttling others
> on another shared resource - journal in your case.

Be that as it may, how do I find which cgroup is the culprit?

> 
>> However, this in turn might affect other processes if they try to
>> sleep on the same jbd2 journal .
> 
> Sure, if the journal is shared then this is an inherent problem. Memcg
> restrictions can easily cause priority inheritance problems as Ted has
> already mentioned.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html