[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <4FF990C3-F998-4003-83D4-91CAE76FCDBE@oracle.com>
Date:	Wed, 21 Feb 2007 10:24:56 -0800
From:	Zach Brown <zach.brown@...cle.com>
To:	Ken Chen <kenchen@...gle.com>
Cc:	"Ananiev, Leonid I" <leonid.i.ananiev@...el.com>,
	Chris Mason <chris.mason@...cle.com>,
	linux-aio <linux-aio@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Benjamin LaHaise <bcrl@...ck.org>,
	Suparna bhattacharya <suparna@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Badari Pulavarty <pbadari@...ibm.com>
Subject: Re: [PATCH 2/2] aio: propogate post-EIOCBQUEUED errors to completion event
On Feb 21, 2007, at 12:35 AM, Ken Chen wrote:
> On 2/20/07, Ananiev, Leonid <leonid.i.ananiev@...el.com> wrote:
>> 1) mem=1G in kernel boot param if you have more
>> 2) unmount; mk2fs; mount
>> 3) dd if=/dev/zero of=<test_file> bs=1M count=1200
>> 4) aiostress -s 1200m -O -o 2 -i 1 -r 16k <test_file>
>> 5) if i++<50 goto 2).
>
> Would you please instrument the call chain of
> invalidate_complete_page2() and tell us exactly where it returns zero
> value in your failure case?
>
>   invalidate_complete_page2
>      try_to_release_page
>         ext3_releasepage
>            journal_try_to_free_buffers
>               ???
For what it's worth, Badari has explained this race in the past in a  
credible way.  I'll take the liberty of pasting a mail from him:
"
kjournald submited buffers for IO and waiting for them to finish.
Note that it has a ref. against the buffer.
journal_commit_transaction()
         ...
         submited buffers for IO
         /* Waiting for IO to complete */
         while (commit_transaction->t_locked_list) {
                 ...
                 get_bh(bh);
                 if (buffer_locked(bh)) {
                         spin_unlock(&journal->j_list_lock);
                         wait_on_buffer(bh);  <<<<<<
                         spin_lock(&journal->j_list_lock);
                 }
                 ..
                 put_bh(bh);
         }
Now, DIO process comes to frees the jh through  
journal_try_to_free_buffers()
but fails to drop_buffers() since kjournald() has a reference against  
it.
invalidate_inode_pages2_range()
         ..
         ext3_releasepage()
                 journal_try_to_free_buffers()
                         journal_put_journal_head()
                                 __journal_try_to_free_buffer()
                                         <--- freed jh
                         try_to_free_buffers()
                                 drop_buffers()
                                         if (buffer_busy(bh))
                                                 goto failed;
                                           <<--- returns EIO due to  
b_count
"
I don't mean to say that we shouldn't get traces to confirm the  
theory, just sharing.  And now we can point to this in the archives  
next time :).
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
