lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 01 Jul 2012 14:57:52 +0200 From: Zdenek Kabelac <zkabelac@...hat.com> To: Hugh Dickins <hughd@...gle.com> CC: LVM general discussion and development <linux-lvm@...hat.com>, amwang@...hat.com, Alasdair G Kergon <agk@...hat.com>, linux-kernel@...r.kernel.org, Lukas Czerner <lczerner@...hat.com> Subject: Re: Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel Dne 1.7.2012 01:10, Hugh Dickins napsal(a): > On Sat, 30 Jun 2012, Zdenek Kabelac wrote: >> Dne 30.6.2012 21:55, Hugh Dickins napsal(a): >>> On Sat, 30 Jun 2012, Zdenek Kabelac wrote: >>>> >>>> When I've used 3.5-rc kernels - I've noticed kernel deadlocks. >>>> Ooops log included. After some experimenting - reliable way to hit this >>>> oops >>>> is to run lvm test suite for 10 minutes. Since 3.5 merge window does not >>>> included anything related to this oops I've went for bisect. >>> >>> Thanks a lot for reporting, and going to such effort to find >>> a reproducible testcase that you could bisect on. >>> >>>> >>>> Game result is commit: 3f31d07571eeea18a7d34db9af21d2285b807a17 >>>> >>>> mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE >>> >>> But this leaves me very puzzled. >>> >>> Is the "lvm test suite" what I find at git.fedorahosted.org/git/lvm2.git >>> under tests/ ? >> >> Yes - that's it - >> >> make >> as root: >> cd test >> make check_local >> >> (inside test subdirectory should be enough, if not - just report any problem) >> >>> If you have something else running at the same time, which happens to use >>> madvise(,,MADV_REMOVE) on a filesystem which the commit above now enables >>> it on (I guess ext4 from the =y in your config), then I suppose we should >>> start searching for improper memory freeing or scribbling in its holepunch >>> support: something that might be corrupting the dm_region in your oops. >> >> What the test is doing - it creates file in LVM_TEST_DIR (default is /tmp) >> and using loop device to simulate device (small size - it should fit bellow >> 200MB) >> >> Within this file second layer through virtual DM devices is created and >> simulates various numbers of PV devices to play with. > > This sounds much easier to set up than I was expecting: > thanks for the info, I'll try it later on today. > >> >> So since everything now support TRIM - such operations should be passed >> down to the backend file - which probably triggers the path. > > What filesystem do you have for /tmp? > > If tmpfs, then it will make much more sense if we assume your bisection > endpoint was off by one. Your bisection log was not quite complete; > and even if it did appear to converge on the commit you cite, you might > have got (un)lucky when testing the commit before it, and concluded > "good" when more attempts would have said "bad". > > The commit before, 83e4fa9c16e4af7122e31be3eca5d57881d236fe > "tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE", would be a > much more likely first bad commit if your /tmp is on tmpfs: > that does indeed wire up loop to pass TRIM down to tmpfs by > fallocate - that indeed played a part in my own testing. > > Whereas if your /tmp is on ext4, loop has been passing TRIM down > with fallocate since v3.0. And whichever, madvise(,,MADV_REMOVE) > should be completely irrelevant. While I've been aware of the fact that tmpfs was enhanced with trim support - I've not tried to run on real ext4 filesystem since for my tests I'm using tmpfs for quite some time to safe rewrites of SSD :) So now I've checked with real ext4 - and the bug is there as well so I've went back - it crashes on 3.4, 3.3 and 3.2 as well. 3.1 is the first kernel which does survive (checked 5 repeated runs) And you are correct, the first commit which causes crash really is 83e4fa9c16e4af when I use tmpfs as backend storage - the problem why I've missed to properly identify this commit in my bisect is that crash usually happens on the second pass of the lvm test suite 'make check_local' execution - and I've been running test just once. To be sure I've run 5 run passes on 3.4.0-08568-gec9516f - which is OK, but 3.4.0-08569-g83e4fa9 is crashing usually on second run, with commit 3f31d07571e the crash always happens in the first pass. I've also checked some rawhide kernel vmlinuz-3.5.0-0.rc2.git0.1.fc18.x86_64 and it's crashing as well - so it's probably not uniqueness of my config. So is there any primary suspect in 3.2 which is worth to check - or I need another day to play another bisect game ? > >> >>> I'll be surprised if that is the case, but it's something that you can >>> easily check by inserting a WARN_ON(1) in mm/madvise.c madvise_remove(): >>> that should tell us what process is using it. >> >> I could try that if that will help. > > That would help, if you're very sure of your bisection endpoint; > but if your /tmp is on tmpfs, then I do think it's more likely > that you've actually found a bug in the commit before. the only thing which could be tricky is udev support (by default it's not enabled ---enable-udev_sync) However Debian based distros are distributing their own rules, which are not 100% compatible with upstream and create some unpredictable issues, where the slowness is the least problem.) If you have Fedora Rawhide with latest lvm2 installed - you should get pretty well configured system for running test dir (unfortunately there is no way to virtualize udev...) Zdenek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists