[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=XqROAp2MOgwQXEQjdkLMenh_OTQ@mail.gmail.com>
Date: Wed, 11 May 2011 18:42:38 -0400
From: Andrew Lutomirski <luto@....edu>
To: linux-kernel@...r.kernel.org
Subject: Kernel falls apart under light memory pressure (i.e. linking vmlinux)
For the last few days (since moving my disk to a new laptop), my
system has been hanging, usually unrecoverably, under light memory
pressure. When this happens, I usually see soft lockups and no OOM
kill. Mouse and keyboard input stop working. Sometimes I can switch
VTs; sometimes I can't. If I just wait it out, sometimes the system
comes back after a couple of minutes but usually even ten minutes or
so isn't enough. If I force an OOM kill (Alt-SysRq-F), my system
sometimes recovers. I've attached the dmesg from when that happened
(in that case the freeze was triggered by linking a kernel and the OOM
killer killed ld.)
I can trigger it about half of the time my building a kernel (it
usually dies while linking or doing the .tmp_* stuff) and 100% of the
time by running the attached script with parameters "1500 1400 1".
The script creates a 1500M file on a ramfs, sets up dm-crypt over
loopback on that file, formats it as ext4, and mounts it, then starts
writing a 1400M file over and over on the ext4 partition.
I cannot trigger the problem by running the same script on a different
machine (with 8 GB RAM) with parameters 6000 5500 1. I can't trigger
it on this machine from initramfs (same kernel image) or from
systemd's emergency shell. I can trigger it some of the time from
systemd's rescue shell (which as a little bit more stuff running).
The problem seems about equally prevalent with ACHI or compatibility
mode and with aesni-intel enabled and disabled. (aesni-intel causes
cryptd to get pulled in, so I thought that might be the issue.)
I can sometimes (but not always) trigger this by enabling swap and
running dirty_ram 2048 (attached). (One time it took the system down
completely. I have ~8 GB of swap, all of which was empty when I ran
the program.)
I see this problem on 2.6.38.{5,6}, 2.6.39-<something from today>, and
Fedora 15's kernel, so I doubt it's an oddity of my kernel config.
I also had this problem while running Fedora 15's installer to upgrade
from Fedora 14 to 15, which rules out a lot of weird userspace issues.
This box is a Lenovo X220 Sandy Bridge laptop with 2G of RAM (the old
box had more) and runs ext4 on LVM on dm-crypt on an SSD. I see the
problem with and without a swap partition. I've also tried unloading
most drivers and the test still fails. Memtest passes.
If I had to guess, I'd say that the VM gets confused when it's forced
to write data out to my LVM-over-dm-crypt partition and either starts
OOM-killing things when it's not out of memory or deadlocks because it
runs out of available RAM and can't service new dm-crypt and block
requests.
Please help fix/debug this. It's making my shiny new laptop almost useless.
--Andy
View attachment "successful-oom-kill.txt" of type "text/plain" (88205 bytes)
Download attachment "test_mempressure.sh" of type "application/x-sh" (1993 bytes)
View attachment "OOM-with-lots-of-swap.txt" of type "text/plain" (34676 bytes)
View attachment "dirty_ram.cc" of type "text/plain" (583 bytes)
Powered by blists - more mailing lists