linux-kernel - Re: doing lots of disk writes causes oom killer to kill processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOMqctT2u7E0kwpm052B9pkNo4D=sYHO+Vk=P_TziUb5KvTMKA@mail.gmail.com>
Date:	Tue, 17 Sep 2013 15:31:31 +0200
From:	Michal Suchanek <hramrach@...il.com>
To:	Hillf Danton <dhillf@...il.com>
Cc:	LKML <linux-kernel@...r.kernel.org>, Linux-MM <linux-mm@...ck.org>
Subject: Re: doing lots of disk writes causes oom killer to kill processes

On 5 September 2013 12:12, Michal Suchanek <hramrach@...il.com> wrote:
> Hello
>
> On 26 August 2013 15:51, Michal Suchanek <hramrach@...il.com> wrote:
>> On 12 March 2013 03:15, Hillf Danton <dhillf@...il.com> wrote:
>>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@...il.com> wrote:
>>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@...il.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I am dealing with VM disk images and performing something like wiping
>>>>> free space to prepare image for compressing and storing on server or
>>>>> copying it to external USB disk causes
>>>>>
>>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>>> are 100% used by system and the machine is basicaly unusable
>>>>>
>>>>> 2) oom killer killing processes
>>>>>
>>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>>
>>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>>
>>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>>> problem even with less ram.
>>>>>
>>>>> I have  vm.swappiness = 0 set for a long  time already.
>>>>>
>>>>>
>>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>>kernel still causes all cores to loop somewhere in system when writing
>>>>lots of data to disk.
>>>>
>>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>>
>>>>Given that the max is 100 the interval in which mm works at all is
>>>>going to be very narrow, less than 10% of the paramater range. This is
>>>>a severe regression as is the cpu time consumed by the kernel.
>>>>
>>>>The io scheduler is the default cfq.
>>>>
>>>>If you have any idea what to try other than downgrading to an earlier
>>>>unaffected kernel I would like to hear.
>>>>
>>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>>> deadlock caused by too_many_isolated())?
>>>
>>> Or try 3.8 and/or 3.9, additionally?
>>>
>>
>> Hello,
>>
>> with deadline IO scheduler I experience this issue less often but it
>> still happens.
>>
>> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>>
>> Do you have some idea what to log so that useful information about the
>> lockup is gathered?
>>
>
> This appears to be fixed in vanilla 3.11 kernel.
>
> I still get short intermittent lockups and cpu usage spikes up to 20%
> on a core but nowhere near the minute+ long lockups with all cores
> 100% on earlier kernels.
>

So I did more testing on the 3.11 kernel and while it works OK with
tar you can get severe lockups with mc or kvm. The difference is
probably the fact that sane tools do fsync() on files they close
forcing the file to write out and the kernel returning possible write
errors before they move on to next file.

With kvm writing to a file used as virtual disk the system would stall
indefinitely until the disk driver in the emulated system would time
out, return disk IO error, and the emulated system would stop writing.
In top I see all CPU cores 90%+ in wait. System is unusable. With mc
the lockups would be indefinite, probably because there is no timeout
on writing a file in mc.

I tried tuning swappiness and eleveators but the the basic problem is
solved by neither: the dirty buffers fill up memory and system stalls
trying to resolve the situation.

Obviously the kernel puts off writing any dirty buffers until the
memory pressure is overwhelming and the vmm flops.

At least the OOM killer does not get invoked anymore since there is
lots of memory - just Linux does not know how to use it.

The solution to this problem is quite simple - use the ancient
userspace bdflushd or what it was called. I emulate it with
{ while true ; do sleep 5; sync ; done } &

The system performance suddenly increases - to the awesome Debian stable levels.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/