linux-kernel - Re: Bug with "fix partial page writes" [3.2-rc regression]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGBYx2bmBAzHyG010Cfgh=4ttdSYZ-z78+LfDhVHAv=unS43SA@mail.gmail.com>
Date:	Tue, 6 Dec 2011 15:57:07 +0800
From:	Yongqiang Yang <xiaoqiangnk@...il.com>
To:	Allison Henderson <achender@...ux.vnet.ibm.com>
Cc:	Tao Ma <tm@....ma>, Hugh Dickins <hughd@...gle.com>,
	"Ted Ts'o" <tytso@....edu>, Curt Wohlgemuth <curtw@...gle.com>,
	Surbhi Palande <csurbhi@...il.com>,
	Rafael Wysocki <rjw@...k.pl>, linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: Bug with "fix partial page writes" [3.2-rc regression]

On Tue, Dec 6, 2011 at 12:05 PM, Allison Henderson
<achender@...ux.vnet.ibm.com> wrote:
> On 12/05/2011 08:44 PM, Yongqiang Yang wrote:
>>
>> On Tue, Dec 6, 2011 at 11:33 AM, Tao Ma<tm@....ma>  wrote:
>>>
>>> On 12/06/2011 11:08 AM, Yongqiang Yang wrote:
>>>>
>>>> Hi Allison,
>>>>
>>>> I noticed another problem which has nothing to do with punching hole.
>>>>  __block_write_begin does not zero buffers beyond EOF.(I guess you
>>>
>>> yes, that is expected.
>>>>
>>>> tried to zero them in your code, am I right? )  When users mapread
>>>> beyond EOF,  users get non-zero data.  I am not sure zero or non-zero
>>>> data should be, but fsx thinks they should be zero data and reports an
>>>> error.
>>>
>>> why users can read the data passing EOF? I am also puzzled. Punching
>>> hole will do this? I don't think it's right.
>>
>> According to code, fiemap_fault handles the case right.   But I met
>> the error - 'non-zero data beyond EOF' reported by fsx.  It is
>> strange.  It seems that uptodate status is set wrong.  Just a guess:-)
>>
>> I am guessing Allison met the problem before and tried to fix it in
>> write path by zeroing buffers beyond EOF.
>
>
> Yes I did run into something similar.  I found 2 cases that involved EOF:
> 1. A truncate shortens EOF, but only zeroed to the end of the block, but not
> to the end of the page.  This was corrected by "[PATCH 5/6 v7] ext4: fix fsx
> truncate failure"
>
> 2. A write extends EOF, but does not zero all of the page beyond EOF, and
> that was what "[PATCH 6/6 v7] ext4: fix partial page writes" was supposed to
> address.
I ran into the 2nd case, this case should be handled by readpage.   In
this case, write_end should not set uptodate on page.   Both mapread
and read should work.  Becasue fiemap_fault calls readpage on
non-uptodate page in mapread case.

It seems that write_end sets page uptodate, as a result garbage data
is seen by applications.  But I can not find why this happens.


Yongqiang.
>
> I am still digging through tracing output at the moment, so I dont have a
> very good explanation right now, but I will keep folks posted if I find
> something.
>
> Allison Henderson
>
>
>
>>
>> Yongqiang.
>>>
>>>
>>> Thanks
>>> Tao
>>>>
>>>>
>>>> It I understand the problem right, it happens more often with punch
>>>> hole.
>>>>
>>>> Yongqiang.
>>>> On Tue, Dec 6, 2011 at 9:40 AM, Allison Henderson
>>>> <achender@...ux.vnet.ibm.com>  wrote:
>>>>>
>>>>> On 12/05/2011 04:38 PM, Hugh Dickins wrote:
>>>>>>
>>>>>>
>>>>>> On Mon, 21 Nov 2011, Hugh Dickins wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 21 Nov 2011, Ted Ts'o wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Nov 20, 2011 at 12:59:10PM -0800, Hugh Dickins wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 8 Nov 2011, Curt Wohlgemuth wrote:
>>>>>>>>> It appears that there's a bug with this patch:
>>>>>>
>>>>>>
>>>>>>
>>>>>> This has been outstanding for a month now, and we've heard no
>>>>>> progress:
>>>>>> please revert commit 02fac1297eb3 "ext4: fix partial page writes" for
>>>>>> rc5.
>>>>>>
>>>>>> The problems appear on a 1k-blocksize filesystem under memory
>>>>>> pressure:
>>>>>> the hunk in ext4_da_write_end() causes oops, because it's playing with
>>>>>> a page after generic_write_end() dropped our last reference to it; and
>>>>>> backing out the hunk in ext4_da_write_begin() is then found to stop
>>>>>> rare data corruption seen when kbuilding.
>>>>>>
>>>>>> Although I earlier reported that backing out the patch caused an fsx
>>>>>> test to fail earlier, I've since found great variation in how soon it
>>>>>> fails, and seen it fail just as quickly with 02fac1297eb3 still in.
>>>>>> I also reported that I had to go back to 2.6.38 for fsx not to fail
>>>>>> under memory pressure: you won't be surprised that that turned out to
>>>>>> be because 2.6.38 defaults nomblk_io_submit but 2.6.39 mblk_io_submit.
>>>>>>
>>>>>> Thanks,
>>>>>> Hugh
>>>>>>
>>>>>
>>>>>
>>>>> Hi there,
>>>>>
>>>>> Have you tried Yongqiang's patch "[PATCH 1/2] ext4: let mpage_submit_io
>>>>> works well when blocksize<  pagesize" ?  I have tried it and it does
>>>>> seem to
>>>>> help, but I am still running into some failures that I am trying to
>>>>> debug,
>>>>> but let please let us know if it helps the issues that you are seeing.
>>>>>  Thx!
>>>>>
>>>>> Allison Henderson
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/