lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 5 May 2008 19:06:36 +0200
From:	Jan Kara <jack@...e.cz>
To:	Badari Pulavarty <pbadari@...ibm.com>
Cc:	Mingming Cao <cmm@...ibm.com>, akpm@...ux-foundation.org,
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] jbd_commit_transaction() races with
	journal_try_to_drop_buffers() causing DIO failures

On Thu 01-05-08 08:16:21, Badari Pulavarty wrote:
> Hi Andrew & Jan,
> 
> I was able to reproduce the customer problem involving DIO
> (invalidate_inode_pages2) problem by writing simple testcase
> to keep writing to a file using buffered writes and DIO writes
> forever in a loop. I see DIO writes fail with -EIO.
> 
> After a long debug, found 2 cases how this could happen.
> These are race conditions with journal_try_to_free_buffers()
> and journal_commit_transaction().
> 
> 1) journal_submit_data_buffers() tries to get bh_state lock. If
> try lock fails, it drops the j_list_lock and sleeps for
> bh_state lock, while holding a reference on the buffer.
> In the meanwhile, journal_try_to_free_buffers() can clean up the
> journal head could call try_to_free_buffers(). try_to_free_buffers()
> would fail due to the reference held by journal_submit_data_buffers()
> - which in turn causes failues for DIO (invalidate_inode_pages2()).
> 
> 2) When the buffer is on t_locked_list waiting for IO to finish,
> we hold a reference and give up the cpu, if we can't get
> bh_state lock. This causes try_to_free_buffers() to fail.
> 
> Fix is to drop the reference on the buffer if we can't get
> bh_state lock, give up the cpu and re-try the whole operation -
> instead of waiting for the vh_state lock.
> 
> Does this look like a resonable fix ?
  As Mingming pointed out there are few other places where we could hold
the bh reference. Note also that we accumulate references to buffers in the
wbuf[] list and we need that for submit_bh() which consumes one bh
reference. Generally, it seems to me as a too fragile and impractical
rule "nobody can hold bh reference when not holding page lock" which is
basically what it comes down to if you really want to be sure that
journal_try_to_free_buffers() succeeds. And also note that in principle
there are other places which hold references to buffers without holding the
page lock - for example writepage() in ordered mode (although this one is
in practice hardly triggerable). So how we could fix at least the races
with commit code is to implement launder_page() callback for ext3/4 which
would wait for the previous transaction commit in case the page has buffers
that are part of that commit (I don't want this logic in
journal_try_to_free_buffers() as that is called also on memory-reclaim
path, but journal_launder_page() is fine with me). This would be correct
but could considerably slow down O_DIRECT writes in cases they're mixed
with buffered writes so I'm not sure if this is acceptable.
  OTOH with the ordered mode rewrite patch, the problem with commit code
also goes away since there we don't need extra references to data buffers
(we use just filemap_fdatawrite).

> 1) journal_submit_data_buffers() tries to get bh_state lock. If
> try lock fails, it drops the j_list_lock and sleeps for
> bh_state lock, while holding a reference on the buffer head.
> In the meanwhile, journal_try_to_free_buffers() can clean up the
> journal head could call try_to_free_buffers(). try_to_free_buffers()
> would fail due to the reference held by journal_submit_data_buffers()
> - which inturn causes failues for DIO (invalidate_inode_pages2()).
> 
> 2) When the buffer is on t_locked_list waiting for IO to finish,
> we hold a reference and give up the cpu, if we can't get 
> bh_state lock. This causes try_to_free_buffers() to fail.
> 
> Fix is to drop the reference on the buffer, give up the cpu
> and re-try the whole operation.
> 
> Signed-off-by: Badari Pulavarty <pbadari@...ibm.com>
> Reviewed-by: Mingming Cao <mcao@...ibm.com>

							Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ