lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <201103081449.44783.eike-kernel@sf-tec.de>
Date:	Tue, 8 Mar 2011 14:49:44 +0100
From:	Rolf Eike Beer <eike-kernel@...tec.de>
To:	linux-lvm@...hat.com
Cc:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.37.2: LVM pvmove hangs system

Am Dienstag 08 März 2011, 10:38:38 schrieb Rolf Eike Beer:
> Hi all,
> 
> I'm experiencing a very annoying system lockup for some days. The setup is
> as follows:
> 
> -two pairs of SATA disks that are bundled into a software raid 1 each
> -each of the raid devices is a physical volume
> -a volume group that includes both pv's
> -all mounted volumes (including root and swap) are in that vg
> 
> The machine is a Xeon E5520 with 16G RAM that is otherwise idle, so swap
> shouldn't matter. And from what I read out of the documentation this all
> looks perfectly sane, but:
> 
> Now I try to move the data from one pv to the other using pv. This prints
> out the current state (currently 10.9%) and then starts doing something.
> Two minutes later the kernel will complain:

After some further testing I _think_ I have an idea what's going on: this is a 
deadlock somewhere in the I/O stack. I have recompiled the kernel with all the 
lock debugging enabled and will probably test this but this is a production 
machine that should better get online again better sooner than later so my 
amount of what I can test is pretty limited. Since the machine is currently 
doing the move and actually working I have not yet booted into the debug 
kernel.

What I did was basically stopping everything on the machine. The only 
userspace programs currently running are init, my sshd, my screen, shell, and 
of course pvmove. And now it works. Whenever I try to do anything that causes 
I/O in parallel the machine will stop working. So this box is basically at 
runlevel 1 now moving all the stuff around instead of doing some useful work 
while moving in the background :(

Eike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ