linux-ext4 - Re: Memory allocation can cause ext4 filesystem to be remounted r/o

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 27 Jun 2013 18:28:21 +0530
From:	Nagachandra P <nagachandra@...il.com>
To:	"Theodore Ts'o" <tytso@....edu>
Cc:	Vikram MP <mp.vikram@...il.com>, linux-ext4@...r.kernel.org
Subject: Re: Memory allocation can cause ext4 filesystem to be remounted r/o

Hi Theodore,

Could you point me to the code where ext4_std_err is not triggered
because of LMK? As I see it, if a memory allocation returns error in
some of the case ext4_std_error would invariably be called. Please
consider the following call stack

send sigkill to 5648 (id.app.sbrowser), score_adj 1000,adj 15, size
13257 with ofree -2010 20287, cfree 18597 902 msa 1000 ma 15
id.app.sbrowser: page allocation failure: order:0, mode:0x50
[<c0013aa8>] (unwind_backtrace+0x0/0x11c) from [<c00d6530>]
(warn_alloc_failed+0xe8/0x110)
[<c00d6530>] (warn_alloc_failed+0xe8/0x110) from [<c00d9308>]
(__alloc_pages_nodemask+0x6d4/0x804)
[<c00d9308>] (__alloc_pages_nodemask+0x6d4/0x804) from [<c00d2b34>]
(find_or_create_page+0x40/0x84)
[<c00d2b34>] (find_or_create_page+0x40/0x84) from [<c0188858>]
(ext4_mb_load_buddy+0xd4/0x2b4)
[<c0188858>] (ext4_mb_load_buddy+0xd4/0x2b4) from [<c018c69c>]
(ext4_free_blocks+0x5d4/0xa08)
[<c018c69c>] (ext4_free_blocks+0x5d4/0xa08) from [<c0181218>]
(ext4_ext_remove_space+0x690/0xd9c)
[<c0181218>] (ext4_ext_remove_space+0x690/0xd9c) from [<c0183654>]
(ext4_ext_truncate+0x100/0x1c8)
[<c0183654>] (ext4_ext_truncate+0x100/0x1c8) from [<c015e2ec>]
(ext4_truncate+0xf4/0x194)
[<c015e2ec>] (ext4_truncate+0xf4/0x194) from [<c01629dc>]
(ext4_evict_inode+0x3b4/0x4ac)
[<c01629dc>] (ext4_evict_inode+0x3b4/0x4ac) from [<c011871c>] (evict+0x8c/0x150)
[<c011871c>] (evict+0x8c/0x150) from [<c010f030>] (do_unlinkat+0xdc/0x134)
[<c010f030>] (do_unlinkat+0xdc/0x134) from [<c000e100>]
(ret_fast_syscall+0x0/0x30)

The failure to allocate memory in above case is because of the kill
signal received.

__alloc_pages_slowpath would return NULL in case its received a KILL
signal. (I don't see any code in 3.4.5 that would check for something
similar to TIF_MEMDIE to make an decision on whether to call
ext4_std_error or not, is this added recently).

Thanks
Naga

On Wed, Jun 26, 2013 at 11:33 PM, Theodore Ts'o <tytso@....edu> wrote:
> On Wed, Jun 26, 2013 at 10:35:22PM +0530, Nagachandra P wrote:
>>
>> These issue are not easy to reproduce!!! We are running multiple
>> applications (of different memory size) over a period of a 24 hrs to
>> 36 hrs and we hit this once. We have seen these issues easier to
>> reproduce typically with around 512MB memory (may be in about 16 hrs -
>> 20 hrs), and harder to reproduce with 1GB memory.
>>
>> Most of the time we get into these situation are when an application
>> (Typically AsyncTasks in Android) that is doing ext4 fs ops are of low
>> adj values (> 9, typically 10 - 12) and hence would be fairly gullible
>> to be killed (and there would be no way to distinguish this from
>> application perspective), this is one of the challenges we are facing.
>> Also, here we are don't have to completely be out of memory (but just
>> withing the LMK band for the process adj value).
>
> To be clear, if the application is killed by the low memory killer,
> we're not going to trigger the ext4_std_err() codepath.  The
> ext4_std_error() is getting called because free memory has fallen to
> _zero_ and so kmem_cache_alloc() returns an error.  Should ext4 do a
> better job with handling this?  Yes, absolutely.  I do consider this a
> fs bug that we should try to fix.  The reality though is if that free
> memory has gone to zero, it's going to put multiple kernel subsystems
> under stress.
>
> It is good to hear that this is only happening on highly memory
> constrained devices --- speaking as a owner of a Nexus 4 with 2GB of
> memory.  :-P
>
> That's why the bigger issue is why did free memory go to zero in the
> first place?  That means the LMK was probably not being aggressive
> enough, or something started consuming a lot of memory too quickly,
> before the page cleaner and write throttling algorithms could kick in
> and try to deal with it.
>
>> But, on rethinking your idea on retrying may work if we have some
>> tweaks in LMK as well (like killing multiple tasks instead of just
>> one).
>
> You might also consider looking at tweaking the mm low watermark and
> minimum watermark.  See the tunable /proc/sys/vm/min_free_kbytes.
>
> You might want to just simply try monitorinig the free memory levels
> on a continuous basis, and see how often it's dropping below some
> minimum level.  This will allow you to give you a figure of merit by
> which you can try tuning your system, without needing to wait for a
> file system error.
>
> Cheers,
>
>                                         - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html