lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150518154616.GC4180@thunk.org>
Date:	Mon, 18 May 2015 11:46:16 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Nikolay Borisov <kernel@...p.com>
Cc:	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Ext4][Bug] Deadlock in ext4 with memcg enabled.

On Mon, May 18, 2015 at 10:35:55AM +0300, Nikolay Borisov wrote:
> The conclusion that I've drawn looking from the code and some offline
> discussions is that when fsync is requested ext4 starts marking pages
> for writeback (ext4_writepages). I think some heavy inlining is
> happening and ext4_map_blocks is being called from:
> 
>  ext4_writepages->mpage_map_and_submit_extent -> mpage_map_one_extent ->
> ext4_map_blocks
> 
> which in turn when trying to write the pages exceeds the memory cgroup
> limit which triggers the memory freeing logic. This, in turn, executes
> the wait_on_page_writeback(page) in shrink_page_list. E.g. the the memcg
> sees a page as being marked for writeback (presumably this is the same
> page which caused the OOM) so it sleeps to wait for the page to be
> written back, but since it is the writeback path that executed the page
> shrinking it causes a deadlock.
> 
> This deadlock then causes other processes on the system to enter D
> state, waiting on trying to acquire a certain inode->i_mutex.

What *should* be happening is that the memory allocations taking place
in find_or_create_page when called by grow_dev_page() should be done
with GFP_NOFS (i.e., the __GFP_FS flag should be masked out).

I think you're right, but I view this as a mm bug; the memory
allocation should have been properly executed with GFP_NOFS so the
memory allocator should know that it can't recurse into page cleaner.
In this case, it looks like it's not doing this, but it is trying to
wait for a page to be cleaned, which is just as bad.

Have you checked to see if this problem has fixed in newer kernels?

     	 	    	   		- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ