[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49wrthlwfk.fsf@segfault.boston.devel.redhat.com>
Date: Tue, 29 Jun 2010 10:56:15 -0400
From: Jeff Moyer <jmoyer@...hat.com>
To: Tao Ma <tao.ma@...cle.com>
Cc: axboe@...nel.dk, vgoyal@...hat.com, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org, Joel Becker <joel.becker@...cle.com>,
Sunil Mushran <sunil.mushran@...cle.com>,
"ocfs2-devel\@oss.oracle.com" <ocfs2-devel@....oracle.com>
Subject: Re: [PATCH 0/3 v5][RFC] ext3/4: enhance fsync performance when using CFQ
Tao Ma <tao.ma@...cle.com> writes:
> Hi Jeff,
>
> On 06/27/2010 09:48 PM, Jeff Moyer wrote:
>> Tao Ma<tao.ma@...cle.com> writes:
>>> I am sorry to say that the patch make jbd2 locked up when I tested
>>> fs_mark using ocfs2.
>>> I have attached the log from my netconsole server. After I reverted
>>> the patch [3/3], the box works again.
>>
>> I can't reproduce this, unfortunately. Also, when building with the
>> .config you sent me, the disassembly doesn't line up with the stack
>> trace you posted.
>>
>> I'm not sure why yielding the queue would cause a deadlock. The only
>> explanation I can come up with is that I/O is not being issued. I'm
>> assuming that no other I/O will be completed to the file system in
>> question. Is that right? Could you send along the output from sysrq-t?
> yes, I just mounted it and begin the test, so there should be no
> outstanding I/O. So do you need me to setup another disk for test?
> I have attached the sysrq output in sysrq.log. please check.
Well, if it doesn't take long to reproduce, then it might be helpful to
see a blktrace of the run. However, it might also just be worth waiting
for the next version of the patch to see if that fixes your issue.
> btw, I also met with a NULL pointer deference in cfq_yield. I have
> attached the null.log also. This seems to be related to the previous
> deadlock and happens when I try to remount the same volume after
> reboot and ocfs2 try to do some recovery.
Pid: 4130, comm: ocfs2_wq Not tainted 2.6.35-rc3+ #5 0MM599/OptiPlex 745
RIP: 0010:[<ffffffff82161537>]
[<ffffffff82161537>] cfq_yield+0x5f/0x135
RSP: 0018:ffff880123061c60 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88012c2b5ea8 RCX: ffff88012c3a30d0
ffffffff82161528: e8 69 eb ff ff callq ffffffff82160096 <cfq_cic_lookup>
ffffffff8216152d: 49 89 c6 mov %rax,%r14
ffffffff82161530: 48 8b 85 00 06 00 00 mov 0x600(%rbp),%rax
ffffffff82161537: f0 48 ff 00 lock incq (%rax)
I'm pretty sure that's a NULL pointer deref of the tsk->iocontext that
was passed into the yield function. I've since fixed that, so your
recovery code should be safe in the newest version (which I've not yet
posted).
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists