lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <26e0640c-40b0-5121-5d85-727d4fef82c5@binary-island.eu>
Date:	Mon, 18 Apr 2016 17:41:02 +0200
From:	Matthias Dahl <ml_linux-kernel@...ary-island.eu>
To:	linux-kernel@...r.kernel.org
Subject: [4.4, 4.5, 4.6] Regression: encrypted swap (dm-crypt) freezes system
 while under memory pressure and swapping

Hello @all,

first of all, I am not subscribed to the list, so I have to kindly ask
to be cc'ed for all replies. Thanks in advance.

Recently I started seeing freezes while compiling bigger packages that
do require lots of memory (I use Gentoo).

The freezes where in the form that while in Xorg, the system would just
completely hang -- no magic sysrq keys, no mouse movement, nothing.
While in a terminal, one could still issue a magic sysrq command but it
would only echo the command itself but not execute it -- except for the
reboot command. So there was no way to get a backtrace or states or
anything alike.

After debugging this further, it became clear that the system always
froze when it started hitting the encrypted swap. It worked absolutely
fine as soon as you took the encryption out of the picture.

My setup then was: A 8 GiB swap on S/W-RAID5 for my 8 GiB physical ram
that was encrypted with dm-crypt and AES256-CBC-ESSIV.

I debugged this further and changed my setup to several swap partitions
on the physical disks w/o a RAID in-between to isolate the culprit. This
made no difference -- neither did switching ciphers and so forth.

Since this setup had worked for ages, I started looking into what had
changed the weeks before and noticed I had done several kernel upgrades.

To make a long story short, here my findings:

4.3.0, 4.4.0-final, 4.5-rc1 to 4.5-rc2:
No problems, except for the usual sluggishness with encrypted swap that
has been there since forever (it is like the encryption has the highest
priority and takes over the system, e.g. no terminal input is accepted
on a different terminal while high memory pressure is going on which is
in contrast with the encrypted swap, where this still works fine).

4.4.x, >= 4.5-rc3 (incl. 4.6-rcX and master):
The system freezes under memory pressure as soon as it starts swapping
out. 4.6 master is an exception here, it still responds to magic sysrq
commands properly but after some time though completely freezes hard.

I hadn't had the time to test all 4.3.x and 4.4.x releases, I am afraid.
What I can say though is that 4.4.6 is affected as well.

A git bisect between 4.5-rc2 and 4.5-rc3, lead me to the following commit:

564e81a57f9788b1475127012e0fd44e9049e342 is the first bad commit
commit 564e81a57f9788b1475127012e0fd44e9049e342
Author: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Date:   Fri Feb 5 15:36:30 2016 -0800

    mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any
progress

This is obviously not the real culprit in my opinion but a trigger.
Reverting that commit on 4.5.1 for example, makes the encrypted swap
work flawlessly again (except for the usual system sluggishness).

Reverting it on 4.6 master@...46c73264b03000d1e18b22f5caf63332547c9,
does show a different picture though: The system freezes while the sysrq
keys do still work and usually recovers after some while if the
corresponding task that triggered the swapping in the first place, gets
killed. It sometimes does a bit of swapping, and sometimes don't while
it hangs there -- while usually with the other kernels in the "frozen"
state, the swapping stops completely.

I managed to get a bit more information out of 4.6 master though since
it sometimes recovers after quite some time and I can copy backtraces
and such to the disk, which I have attached.

I hope this helps in finding the real issue behind this. I am sorry I
could not provide more information but this has been a rather time
consuming task thus far. :-)

If there is anything else I can do to help or test, please let me know
and I will gladly do so.

Thanks in advance.

So long,
Matthias

-- 
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu
 services: custom software [desktop, mobile, web], server administration

View attachment "4.6.0-reverted-bad-commit.log" of type "text/plain" (130018 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ