[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <02580b0a303da26b669b4a9892624b13@mail.ud19.udmedia.de>
Date: Tue, 12 Jul 2016 10:27:37 +0200
From: Matthias Dahl <ml_linux-kernel@...ary-island.eu>
To: linux-raid@...r.kernel.org
Cc: linux-mm@...ck.org, dm-devel@...hat.com,
linux-kernel@...r.kernel.org
Subject: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel
Rapid Storage)
Hello,
I posted this issue already on linux-mm, linux-kernel and dm-devel a
few days ago and after further investigation it seems like that this
issue is somehow related to the fact that I am using an Intel Rapid
Storage RAID10, so I am summarizing everything again in this mail
and include linux-raid in my post. Sorry for the noise... :(
I am currently setting up a new machine (since my old one broke down)
and I ran into a lot of " Unable to allocate memory on node -1" warnings
while using dm-crypt. I have attached as much of the full log as I could
recover.
The encrypted device is sitting on a RAID10 (software raid, Intel Rapid
Storage). I am currently limited to testing via Linux live images since
the machine is not yet properly setup but I did my tests across several
of those.
Steps to reproduce are:
1)
cryptsetup -s 512 -d /dev/urandom -c aes-xts-plain64 open --type plain
/dev/md126p5 test-device
2)
dd if=/dev/zero of=/dev/mapper/test-device status=progress bs=512K
While running and monitoring the memory usage with free, it can be seen
that the used memory increases rapidly and after just a few seconds, the
system is out of memory and page allocation failures start to be issued
as well as the OOM killer gets involved.
I have also seen this behavior with mkfs.ext4 being used on the same
device -- at least with 1.43.1.
Using direct i/o will work fine and not cause any issue. Also if
dm-crypt
is out of the picture, the problem does also not occur.
I did further tests:
1) dd block size has no influence on the issue whatsoever
2) using dm-crypt on an image located on an ext2 on the RAID10 works
fine
3) using an external (connected through USB3) hd with two partitions
and using either a RAID1 or RAID10 on it via Linux s/w RAID with
dm-crypt on-top, does also work fine
But as soon as I use dm-crypt on the Intel Rapid Storage RAID10, the
issue is 100% reproducible.
I tested all of this on a Fedora Rawhide Live Image as I currently still
am
in the process of setting the new machine up. Those images are available
here to download:
download.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/iso/
The machine itself has 32 GiB of RAM (plenty), no swap (live image)
and is a 6700k on a Z170 chipset. The kernel is the default provided
with the live image... right now that is a very recent git after
4.7.0rc6 but before rc7. But the issue also shows on 4.4.8 and 4.5.5.
The stripe size of the RAID10 is 64k, if that matters.
I am now pretty much out of ideas what else to test and where the
problem
could stem from. Suffice to say that this has impacted my trust in this
particular setup. I hope I can help to find the cause of this.
If there is anything I can do to help, please let me know.
Also, since I am not subscribed to the lists right now (I have to make
due
with a crappy WebMail interface until everything is setup), please cc'
me
accordingly. Thanks a lot.
With Kind Regards from Germany,
Matthias
--
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
services: custom software [desktop, mobile, web], server administration
View attachment "mdstat.txt" of type "text/plain" (296 bytes)
View attachment "vmstat.txt" of type "text/plain" (2738 bytes)
Download attachment "crypto.txt.gz" of type "application/x-gzip" (1197 bytes)
Download attachment "kernel.log.txt.gz" of type "application/x-gzip" (24060 bytes)
Download attachment "sysctl.txt.gz" of type "application/x-gzip" (7591 bytes)
Powered by blists - more mailing lists