lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 05 May 2008 17:10:59 -0700 From: Badari Pulavarty <pbadari@...ibm.com> To: Jan Kara <jack@...e.cz> Cc: Mingming Cao <cmm@...ibm.com>, akpm@...ux-foundation.org, linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org Subject: Re: [PATCH] jbd_commit_transaction() races with journal_try_to_drop_buffers() causing DIO failures On Mon, 2008-05-05 at 19:06 +0200, Jan Kara wrote: > On Thu 01-05-08 08:16:21, Badari Pulavarty wrote: > > Hi Andrew & Jan, > > > > I was able to reproduce the customer problem involving DIO > > (invalidate_inode_pages2) problem by writing simple testcase > > to keep writing to a file using buffered writes and DIO writes > > forever in a loop. I see DIO writes fail with -EIO. > > > > After a long debug, found 2 cases how this could happen. > > These are race conditions with journal_try_to_free_buffers() > > and journal_commit_transaction(). > > > > 1) journal_submit_data_buffers() tries to get bh_state lock. If > > try lock fails, it drops the j_list_lock and sleeps for > > bh_state lock, while holding a reference on the buffer. > > In the meanwhile, journal_try_to_free_buffers() can clean up the > > journal head could call try_to_free_buffers(). try_to_free_buffers() > > would fail due to the reference held by journal_submit_data_buffers() > > - which in turn causes failues for DIO (invalidate_inode_pages2()). > > > > 2) When the buffer is on t_locked_list waiting for IO to finish, > > we hold a reference and give up the cpu, if we can't get > > bh_state lock. This causes try_to_free_buffers() to fail. > > > > Fix is to drop the reference on the buffer if we can't get > > bh_state lock, give up the cpu and re-try the whole operation - > > instead of waiting for the vh_state lock. > > > > Does this look like a resonable fix ? > As Mingming pointed out there are few other places where we could hold > the bh reference. Note also that we accumulate references to buffers in the > wbuf[] list and we need that for submit_bh() which consumes one bh > reference. Generally, it seems to me as a too fragile and impractical > rule "nobody can hold bh reference when not holding page lock" which is > basically what it comes down to if you really want to be sure that > journal_try_to_free_buffers() succeeds. And also note that in principle > there are other places which hold references to buffers without holding the > page lock - for example writepage() in ordered mode (although this one is > in practice hardly triggerable). So how we could fix at least the races > with commit code is to implement launder_page() callback for ext3/4 which > would wait for the previous transaction commit in case the page has buffers > that are part of that commit (I don't want this logic in > journal_try_to_free_buffers() as that is called also on memory-reclaim > path, but journal_launder_page() is fine with me). This would be correct > but could considerably slow down O_DIRECT writes in cases they're mixed > with buffered writes so I'm not sure if this is acceptable. Yes. I have been discussing all of these with Mingming. I agree with you that it looks silly to update all the places to not hold a ref on the buffer while waiting for spinlocks. Adding a launder_page() seems to be the right approach. I wouldn't worry about making O_DIRECT slower. Currently, its failing with -EIO in this case anyway. Unfortunately, this happens at a customer site and I can't effort to give them a complete re-write type patch - I need to get this into older distro release :( Thats why I am trying to fix the cases. I am finding more and more of these .. Thanks, Badari -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists