lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20140708174843.046ed76a@pog.tecnopolis.ca>
Date:	Tue, 8 Jul 2014 17:48:43 -0500
From:	Trevor Cordes <trevor@...nopolis.ca>
To:	linux-kernel@...r.kernel.org
Subject: memory bug ever since 3.12, oom-killer invoked, computer freezes

Excuse a novice on his first post to this list.  I have tried to obtain
help elsewhere with no success.

I have been dealing with a bad kernel bug since 3.12 came out.  It is
present in 3.12, 3.13 and 3.14 up to 3.14.8 (Fedora 19 kernel).

What happens is around the same time every day, using the buggy
kernels, I get dozens of oom-killer messages over about 3-5 minutes,
the system slows to a crawl instantly, and usually freezes (numlock no
longer works, etc) within a few minutes.

Using 3.11, the system runs fine, there is no bug.

I think I have isolated the trigger of the problem to a simple
backup-helper script I run nightly at the same time.  I have come to
this conclusion based on the fact I can run in 3.14 for many days with
no problems if I disable my script from running.  As soon as I enable
the script, the bug will hit the subsequent morning at the same time as
usual.  Again, in 3.11 there is no bug even if my script is running.

I have made a RH bugzilla bug for this that contains even more detail:
https://bugzilla.redhat.com/show_bug.cgi?id=1075185

My script looks like this (simplified):
#!/bin/perl
$dirs="/ /mnt/peecee/DATA";
$Ddest="/data/Bak/FindList";
system "/bin/nice -n19 /usr/bin/ionice -c2 -n7 -t find $dirs -xdev -ls
2>/dev/null > $Ddest/find-list";

Notes: /mnt/peecee is a cifs share (old XP box).  $Ddest is an NFS
mount on my file server.

This script runs in about 1 min when nothing is cached, about 10s when
everything is cached.

I can run this script 200 times over and over again manually for
testing (not via the usual cron) and it does NOT trigger the bug.  It
is only when I enable this script via cron that the bug occurs.

I have captured key /proc files at moments in time before/during the
bug occurring, which may help figure out the problem.  I have attached
those files to the bugzilla linked above.  I can post them here if
required.  I can obtain more/finer results if required.  I can
reproduce this bug "sort of on demand" by enabling my script to run the
following morning.

Known buggy kernels:
3.14.8-100.fc19
3.14.4-100.fc19
3.13.9-100.fc19
3.13.5-103.fc19
3.12.9-201.fc19

Known good kernel:
3.11.10-200.fc19

My kernels are all 32-bit, PAE.

My / is md RAID1.  The disks are 15k UW-SCSI enterprise drives.  The
controller is Adaptec AIC-7892A U160/m, a 29160 card I believe.  I am
usually tainted with Nvidia video driver binary, but can untaint for
purposes of testing.

I wanted to bisect to help figure this out but cannot using Fedora
tools due to bug in 32-bit python libraries.  I don't know how to
bisect the vanilla kernel whilst still incorporating all Fedora tweaks
without using Fedora tools.

I did much googling and discovered this thread which sounds very much
related to my problem, though not an exact duplicate:
http://marc.info/?l=linux-mm&m=139267140606805&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ