linux-ext4 - Re: Memory allocation can cause ext4 filesystem to be remounted r/o

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130626180345.GA4128@thunk.org>
Date:	Wed, 26 Jun 2013 14:03:45 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Nagachandra P <nagachandra@...il.com>
Cc:	Vikram MP <mp.vikram@...il.com>, linux-ext4@...r.kernel.org
Subject: Re: Memory allocation can cause ext4 filesystem to be remounted r/o

On Wed, Jun 26, 2013 at 10:35:22PM +0530, Nagachandra P wrote:
> 
> These issue are not easy to reproduce!!! We are running multiple
> applications (of different memory size) over a period of a 24 hrs to
> 36 hrs and we hit this once. We have seen these issues easier to
> reproduce typically with around 512MB memory (may be in about 16 hrs -
> 20 hrs), and harder to reproduce with 1GB memory.
> 
> Most of the time we get into these situation are when an application
> (Typically AsyncTasks in Android) that is doing ext4 fs ops are of low
> adj values (> 9, typically 10 - 12) and hence would be fairly gullible
> to be killed (and there would be no way to distinguish this from
> application perspective), this is one of the challenges we are facing.
> Also, here we are don't have to completely be out of memory (but just
> withing the LMK band for the process adj value).

To be clear, if the application is killed by the low memory killer,
we're not going to trigger the ext4_std_err() codepath.  The
ext4_std_error() is getting called because free memory has fallen to
_zero_ and so kmem_cache_alloc() returns an error.  Should ext4 do a
better job with handling this?  Yes, absolutely.  I do consider this a
fs bug that we should try to fix.  The reality though is if that free
memory has gone to zero, it's going to put multiple kernel subsystems
under stress.

It is good to hear that this is only happening on highly memory
constrained devices --- speaking as a owner of a Nexus 4 with 2GB of
memory.  :-P

That's why the bigger issue is why did free memory go to zero in the
first place?  That means the LMK was probably not being aggressive
enough, or something started consuming a lot of memory too quickly,
before the page cleaner and write throttling algorithms could kick in
and try to deal with it.

> But, on rethinking your idea on retrying may work if we have some
> tweaks in LMK as well (like killing multiple tasks instead of just
> one).

You might also consider looking at tweaking the mm low watermark and
minimum watermark.  See the tunable /proc/sys/vm/min_free_kbytes.

You might want to just simply try monitorinig the free memory levels
on a continuous basis, and see how often it's dropping below some
minimum level.  This will allow you to give you a figure of merit by
which you can try tuning your system, without needing to wait for a
file system error.

Cheers,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html