lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 01 Jul 2012 14:57:52 +0200
From:	Zdenek Kabelac <zkabelac@...hat.com>
To:	Hugh Dickins <hughd@...gle.com>
CC:	LVM general discussion and development <linux-lvm@...hat.com>,
	amwang@...hat.com, Alasdair G Kergon <agk@...hat.com>,
	linux-kernel@...r.kernel.org, Lukas Czerner <lczerner@...hat.com>
Subject: Re: Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel

Dne 1.7.2012 01:10, Hugh Dickins napsal(a):
> On Sat, 30 Jun 2012, Zdenek Kabelac wrote:
>> Dne 30.6.2012 21:55, Hugh Dickins napsal(a):
>>> On Sat, 30 Jun 2012, Zdenek Kabelac wrote:
>>>>
>>>> When I've used 3.5-rc kernels - I've noticed kernel deadlocks.
>>>> Ooops log included. After some experimenting - reliable way to hit this
>>>> oops
>>>> is to run lvm test suite for 10 minutes. Since 3.5 merge window does not
>>>> included anything related to this oops I've went for bisect.
>>>
>>> Thanks a lot for reporting, and going to such effort to find
>>> a reproducible testcase that you could bisect on.
>>>
>>>>
>>>> Game result is commit: 3f31d07571eeea18a7d34db9af21d2285b807a17
>>>>
>>>> mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE
>>>
>>> But this leaves me very puzzled.
>>>
>>> Is the "lvm test suite" what I find at git.fedorahosted.org/git/lvm2.git
>>> under tests/ ?
>>
>> Yes - that's it -
>>
>>   make
>> as root:
>>   cd test
>>   make check_local
>>
>> (inside test subdirectory should be enough, if not - just report any problem)
>>
>>> If you have something else running at the same time, which happens to use
>>> madvise(,,MADV_REMOVE) on a filesystem which the commit above now enables
>>> it on (I guess ext4 from the =y in your config), then I suppose we should
>>> start searching for improper memory freeing or scribbling in its holepunch
>>> support: something that might be corrupting the dm_region in your oops.
>>
>> What the test is doing - it creates file in  LVM_TEST_DIR (default is /tmp)
>> and using loop device to simulate device (small size - it should fit bellow
>> 200MB)
>>
>> Within this file second layer through virtual DM devices is created and
>> simulates various numbers of PV devices to play with.
>
> This sounds much easier to set up than I was expecting:
> thanks for the info, I'll try it later on today.
>
>>
>> So since everything now support TRIM - such operations should be passed
>> down to the backend file - which probably triggers the path.
>
> What filesystem do you have for /tmp?
>
> If tmpfs, then it will make much more sense if we assume your bisection
> endpoint was off by one.  Your bisection log was not quite complete;
> and even if it did appear to converge on the commit you cite, you might
> have got (un)lucky when testing the commit before it, and concluded
> "good" when more attempts would have said "bad".
>
> The commit before, 83e4fa9c16e4af7122e31be3eca5d57881d236fe
> "tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE", would be a
> much more likely first bad commit if your /tmp is on tmpfs:
> that does indeed wire up loop to pass TRIM down to tmpfs by
> fallocate - that indeed played a part in my own testing.
>
> Whereas if your /tmp is on ext4, loop has been passing TRIM down
> with fallocate since v3.0.  And whichever, madvise(,,MADV_REMOVE)
> should be completely irrelevant.

While I've been aware of the fact that tmpfs was enhanced with trim support - 
I've not tried to run on real ext4 filesystem since for my tests I'm using 
tmpfs for quite some time to safe rewrites of SSD :)

So now I've checked with real ext4 - and the bug is there as well
so I've went back - it crashes on 3.4, 3.3 and 3.2 as well.

3.1 is the first kernel which does survive (checked 5 repeated runs)

And you are correct, the first commit which causes crash really is
83e4fa9c16e4af  when I use  tmpfs as backend storage - the problem why I've 
missed to properly identify this commit in my bisect is that crash usually 
happens on the second pass of the lvm test suite 'make check_local' execution 
- and I've been running test just once. To be sure I've run 5 run passes on 
3.4.0-08568-gec9516f - which is OK, but 3.4.0-08569-g83e4fa9 is crashing 
usually on second run, with commit 3f31d07571e  the crash always happens in 
the first pass.

I've also checked some rawhide kernel vmlinuz-3.5.0-0.rc2.git0.1.fc18.x86_64
and it's crashing as well - so it's probably not uniqueness of my config.

So is there any primary suspect in 3.2 which is worth to check - or I need 
another day to play another bisect game ?

>
>>
>>> I'll be surprised if that is the case, but it's something that you can
>>> easily check by inserting a WARN_ON(1) in mm/madvise.c madvise_remove():
>>> that should tell us what process is using it.
>>
>> I could try that if that will help.
>
> That would help, if you're very sure of your bisection endpoint;
> but if your /tmp is on tmpfs, then I do think it's more likely
> that you've actually found a bug in the commit before.

the only thing which could be tricky is  udev support
(by default it's not enabled ---enable-udev_sync)
However Debian based distros are distributing their own rules, which are not 
100% compatible with upstream and create some unpredictable issues,
where the slowness is the least problem.)

If you have Fedora Rawhide with latest lvm2 installed - you should get pretty 
well configured system for running test dir   (unfortunately there is no way 
to virtualize udev...)

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists