lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20130617122518.GA24403@thunk.org> Date: Mon, 17 Jun 2013 08:25:18 -0400 From: Theodore Ts'o <tytso@....edu> To: Lukáš Czerner <lczerner@...hat.com> Cc: linux-ext4@...r.kernel.org Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote: > > Correction... reverting patches #15 through #19 (which is what I did in > > the dev-with-revert branch found on ext4.git) causes the problem to go > > away in the nojournal case, but it causes a huge number of other > > problems. Some of the reverts weren't clean, so it's possible I > > screwed up one of the reverts. It's also possible that only applying > > part of this series leaves the tree in an unstable state. > > > > I'd much rather figure out how to fix the problem on the dev branch, > > so thank you for looking into this! > > Wow, this looks bad. Theoretically reverting patches %15 through > #19 should not have any real impact. So far I do not see what is > causing that, but I am looking into this. I've been looking into this more intensively over the weekend. I'm now beginning to think we have had a pre-existing race, and the changes in question has simply changed the timing. I tried a version of the dev branch (you can find it as the branch dev2 in my kernel.org's ext4.git tree) which only had patches 1 through 10 of the invalidate page range patches (dropping patches 11 through 20), and I found that generic/300 was failing in the configuration ext3 (a file system with nodelalloc, no flex_bg, and no extents). I also found the same failure with a 3.10-rc2 configuration. The your changes seem to make generic/300 failure consistently for me using the nojournal configuration, but looking at patches in question, I don't think they could have directly caused the problem. Instead, I think they just changed the timing to unmask the problem. Given that I've seen generic/300 test failures in various different baselines going all the way back to 3.9-rc4, this isn't a recent regression. And given that it does seem to be timing sensitive, bisecting it is going to be difficult. On the other hand, given that using the dev (or master) branch, generic/300 is failing with a greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and 5400 rpm laptop drives in nojournal mode, the fact that it's reproducing relatively reliably hopefully will make it easier to find the problem. > I see that there are problems in other mode, not just nojournal. Are > those caused by this as well, or are you seeing those even without > the patchset ? I think the other problems in my dev-with-revert branch was caused by some screw up on my part when did the revert using git. I found that dropping the patches from a copy of the guilt patch stack, and then applying all of the patches except for the last half of the invalidate page range patch series, resulted in a clean branch that didn't have any of these failures. It's what I should have done late last week, instead of trying to use "git revert". Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists