lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.00.1112080909300.22300@sister.anvils>
Date:	Thu, 8 Dec 2011 09:39:55 -0800 (PST)
From:	Hugh Dickins <hughd@...gle.com>
To:	Allison Henderson <achender@...ux.vnet.ibm.com>
cc:	Yongqiang Yang <xiaoqiangnk@...il.com>, Ted Ts'o <tytso@....edu>,
	Curt Wohlgemuth <curtw@...gle.com>,
	Surbhi Palande <csurbhi@...il.com>,
	Rafael Wysocki <rjw@...k.pl>, linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org, Andy Whitcroft <apw@...onical.com>
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Wed, 7 Dec 2011, Allison Henderson wrote:
> On 12/07/2011 10:04 AM, Allison Henderson wrote:
> > On 12/07/2011 01:28 AM, Yongqiang Yang wrote:
> > > Hi Allison and Hugh,
> > > 
> > > I think I found the problem and it has nothing to do with punching
> > > hole. The patch [ext4: let ext4_bio_write_page handle EOF correctly]
> > > would fix up the problem.
> > > 
> > > I post the patch so that it can be tested as early as possible. The
> > > problem has not appeared on my machine since the patch is applied.
> > > 
> > > Yongqiang.
> > 
> > Great! I will try it out with your other set in my sandbox and let you
> > know what happens. Thx!
> > 
> > Allison Henderson
> 
> Well, it's been running several hours now with out problems, so I think it
> will be ok, but I will let it run the full day.
> 
> Andy, I know you were also seeing issues in this area.  Could you try
> Yongqiang patches?  The code you were modifying needed to be removed, so I
> think they will resolve the issues you were seeing too.  Please try the
> following patch sets:
> 
> [PATCH 1/2] ext4: let mpage_submit_io works well when blocksize < pagesize
> [PATCH 2/2] ext4: let ext4_discard_partial_buffers handle pages without
> buffers correctly
> 
> and
> 
> [PATCH 1/2] ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized
> [PATCH 2/2] ext4: let ext4_bio_write_page handle EOF correctly

Those patches are working well for me, many thanks to Yongqiang.

The last (or more of them?) fix behaviour going back several
releases, and ought to be sent to -stable after verification.

I ran fsx (args as before on 1024k block ext2fs under memory pressure)
for 8 hours on three machines, and no problem showed up on any.
I didn't have time to try ext4, but I expect that you did.

And I've run kernel builds under memory pressure for 7.5 hours,
no problem has showed up there either - although that's not long
enough yet to validate the oops fix by itself, we've earlier run
long enough with the first 2/2 to be sure that it fixes the oops,
and the "corruption" that I saw.

Quotes around corruption now because, from Yongqiang's description,
I'm guessing that ld was mmap'ing objfiles and acting on "data"
from beyond eof.  Which ld does have the right to do, it should
indeed be zeroed.

Only once, before the fixes, did I ever see an unexplained EINVAL
(from cp), like Andy reports: I'm very hopeful his case is fixed too.

Thanks!
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ