linux-kernel - Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4C2A9054.20500@oracle.com>
Date:	Wed, 30 Jun 2010 08:31:16 +0800
From:	Tao Ma <tao.ma@...cle.com>
To:	Jeff Moyer <jmoyer@...hat.com>
CC:	axboe@...nel.dk, vgoyal@...hat.com, linux-kernel@...r.kernel.org,
	linux-ext4@...r.kernel.org, Joel Becker <joel.becker@...cle.com>,
	Sunil Mushran <sunil.mushran@...cle.com>,
	"ocfs2-devel@....oracle.com" <ocfs2-devel@....oracle.com>
Subject: Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using
 CFQ

Hi Jeff,

On 06/29/2010 10:56 PM, Jeff Moyer wrote:
> Tao Ma<tao.ma@...cle.com>  writes:
>
>> Hi Jeff,
>>
>> On 06/27/2010 09:48 PM, Jeff Moyer wrote:
>>> Tao Ma<tao.ma@...cle.com>   writes:
>>>> I am sorry to say that the patch make jbd2 locked up when I tested
>>>> fs_mark using ocfs2.
>>>> I have attached the log from my netconsole server. After I reverted
>>>> the patch [3/3], the box works again.
>>>
>>> I can't reproduce this, unfortunately.  Also, when building with the
>>> .config you sent me, the disassembly doesn't line up with the stack
>>> trace you posted.
>>>
>>> I'm not sure why yielding the queue would cause a deadlock.  The only
>>> explanation I can come up with is that I/O is not being issued.  I'm
>>> assuming that no other I/O will be completed to the file system in
>>> question.  Is that right?  Could you send along the output from sysrq-t?
>> yes, I just mounted it and begin the test, so there should be no
>> outstanding I/O. So do you need me to setup another disk for test?
>> I have attached the sysrq output in sysrq.log. please check.
>
> Well, if it doesn't take long to reproduce, then it might be helpful to
> see a blktrace of the run.  However, it might also just be worth waiting
> for the next version of the patch to see if that fixes your issue.
>
>> btw, I also met with a NULL pointer deference in cfq_yield. I have
>> attached the null.log also. This seems to be related to the previous
>> deadlock and happens when I try to remount the same volume after
>> reboot and ocfs2 try to do some recovery.
>
>   Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745
>   RIP: 0010:[<ffffffff82161537>]
>    [<ffffffff82161537>] cfq_yield+0x5f/0x135
>   RSP: 0018:ffff880123061c60  EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0
>
> ffffffff82161528:	e8 69 eb ff ff       	callq  ffffffff82160096<cfq_cic_lookup>
> ffffffff8216152d:	49 89 c6             	mov    %rax,%r14
> ffffffff82161530:	48 8b 85 00 06 00 00 	mov    0x600(%rbp),%rax
> ffffffff82161537:	f0 48 ff 00          	lock incq (%rax)
>
> I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that
> was passed into the yield function.  I've since fixed that, so your
> recovery code should be safe in the newest version (which I've not yet
> posted).
ok, so could you please cc me when the new patches are out? It would be 
easier for me to track it. Thanks.

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/