lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <50AACB7E.8060601@imap.cc>
Date:	Tue, 20 Nov 2012 01:14:54 +0100
From:	Tilman Schmidt <tilman@...p.cc>
To:	LKML <linux-kernel@...r.kernel.org>
Subject: possible regression in kernel 3.6: system hangs during nightly tape
 backup

For the 4th time now after switching to kernel 3.6, my system became
unresponsive during the nightly Bacula backup run. It looks as if
all disk accesses are suddenly blocked:
- Desktop apps stop responding one after another, starting with
  Firefox followed by other "heavy" apps, while Konsole windows
  continue being usable for a while.
- "top" shows the load average steadily increasing with no process
  actually consuming relevant quantities of CPU.
- I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
  in a Konsole window just fine, but after the inevitable hard reset
  the file /root/dmesg.out isn't there.
- The "sync" command hangs indefinitely.
- The "shutdown" command and ctrl/alt/Del emit "system going down"
  broadcast messages but never get anywhere.
- Killing processes manually works for some (bacula-sd even ejects
  the tape before exiting) but most remain in state D or Z.
- Eventually, all text consoles are blocked and a hardware reset is
  the only remaining option.
- After the reboot, a Bacula spool file is left behind in
  /var/spool/bacula, proof that the hang happened during the backup.

This does not happen during every backup run, but frequently enough
to be annoying. (About once per week.) It never happened with kernel
3.5. For comparison went back to kernel 3.5.7 for a week and it
never happened during that time. Last night I booted 3.6.7 and the
very next backup caused the hang again. The last kernel message that
made it to the syslog on disk was

Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
524288 bytes.

triggered by the start of the backup. In dmesg the next message was

[74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
seconds.

followed by a backtrace. I have photos of the remaining dmesg output
which I'll try to upload somewhere accessible tomorrow.

Hardware configuration:
Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
onboard S-ATA controller driving two 500 GB S-ATA disks
and a Pioneer DVR-216D DVD-RW drive
Adaptec 29160B Ultra160 SCSI adapter driving a
Tandberg TS400 LTO-2 tape drive

Disk configuration: md RAID1, LVM, ext3 and ext4 volumes

Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
Bacula 5.2.12

HTH
T.

-- 
Tilman Schmidt                    E-Mail: tilman@...p.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


Download attachment "signature.asc" of type "application/pgp-signature" (262 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ