linux-kernel - Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <02580b0a303da26b669b4a9892624b13@mail.ud19.udmedia.de>
Date:	Tue, 12 Jul 2016 10:27:37 +0200
From:	Matthias Dahl <ml_linux-kernel@...ary-island.eu>
To:	linux-raid@...r.kernel.org
Cc:	linux-mm@...ck.org, dm-devel@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel
 Rapid Storage)

Hello,

I posted this issue already on linux-mm, linux-kernel and dm-devel a
few days ago and after further investigation it seems like that this
issue is somehow related to the fact that I am using an Intel Rapid
Storage RAID10, so I am summarizing everything again in this mail
and include linux-raid in my post. Sorry for the noise... :(

I am currently setting up a new machine (since my old one broke down)
and I ran into a lot of " Unable to allocate memory on node -1" warnings
while using dm-crypt. I have attached as much of the full log as I could
recover.

The encrypted device is sitting on a RAID10 (software raid, Intel Rapid
Storage). I am currently limited to testing via Linux live images since
the machine is not yet properly setup but I did my tests across several
of those.

Steps to reproduce are:

1)
cryptsetup -s 512 -d /dev/urandom -c aes-xts-plain64 open --type plain 
/dev/md126p5 test-device

2)
dd if=/dev/zero of=/dev/mapper/test-device status=progress bs=512K

While running and monitoring the memory usage with free, it can be seen
that the used memory increases rapidly and after just a few seconds, the
system is out of memory and page allocation failures start to be issued
as well as the OOM killer gets involved.

I have also seen this behavior with mkfs.ext4 being used on the same
device -- at least with 1.43.1.

Using direct i/o will work fine and not cause any issue. Also if 
dm-crypt
is out of the picture, the problem does also not occur.

I did further tests:

1) dd block size has no influence on the issue whatsoever
2) using dm-crypt on an image located on an ext2 on the RAID10 works
    fine
3) using an external (connected through USB3) hd with two partitions
    and using either a RAID1 or RAID10 on it via Linux s/w RAID with
    dm-crypt on-top, does also work fine

But as soon as I use dm-crypt on the Intel Rapid Storage RAID10, the
issue is 100% reproducible.

I tested all of this on a Fedora Rawhide Live Image as I currently still 
am
in the process of setting the new machine up. Those images are available
here to download:

download.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/iso/

The machine itself has 32 GiB of RAM (plenty), no swap (live image)
and is a 6700k on a Z170 chipset. The kernel is the default provided
with the live image... right now that is a very recent git after
4.7.0rc6 but before rc7. But the issue also shows on 4.4.8 and 4.5.5.

The stripe size of the RAID10 is 64k, if that matters.

I am now pretty much out of ideas what else to test and where the 
problem
could stem from. Suffice to say that this has impacted my trust in this
particular setup. I hope I can help to find the cause of this.

If there is anything I can do to help, please let me know.

Also, since I am not subscribed to the lists right now (I have to make 
due
with a crappy WebMail interface until everything is setup), please cc' 
me
accordingly. Thanks a lot.

With Kind Regards from Germany,
Matthias

-- 
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
  services: custom software [desktop, mobile, web], server administration
View attachment "mdstat.txt" of type "text/plain" (296 bytes)

View attachment "vmstat.txt" of type "text/plain" (2738 bytes)

Download attachment "crypto.txt.gz" of type "application/x-gzip" (1197 bytes)

Download attachment "kernel.log.txt.gz" of type "application/x-gzip" (24060 bytes)

Download attachment "sysctl.txt.gz" of type "application/x-gzip" (7591 bytes)