lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 11 Dec 2007 17:00:32 -0800
From:	Zach Brown <zach.brown@...cle.com>
To:	Hisashi Hifumi <hifumi.hisashi@....ntt.co.jp>
CC:	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Chris Mason <chris.mason@...cle.com>,
	Badari Pulavarty <pbadari@...ibm.com>
Subject: Re: [PATCH] dio: falling through to buffered I/O when invalidation
  of a page fails

Hisashi Hifumi wrote:
> Hi.
> 
> Current dio has some problems:
> 1, In ext3 ordered, dio write can return with EIO because of the race
> between invalidation of
> a page and jbd. jbd pins the bhs while committing journal so
> try_to_release_page fails when jbd
> is committing the transaction.

Yeah.  It sure would be fantastic if some ext3 expert could stop this
from happening somehow.  But that hasn't happened in.. uh.. Badari, for
how many years has this been on the radar? :)

> 
> Past discussion about this issue is as follows.
> http://marc.info/?t=119343431200004&r=1&w=2
> http://marc.info/?t=112656762800002&r=1&w=2
> 
> 2, invalidate_inode_pages2_range() sets ret=-EIO when
> invalidate_complete_page2()
> fails, but this ret is cleared if do_launder_page() succeed on a page of
> next index.

Oops.  That's too bad.  So maybe we should fix it by not stomping on
that return code?

	ret2 = do_launder()
	if (ret2 == 0)
		ret2 = invalidate()
	if (ret == 0)
		ret = ret2

I'd be surprised if we ever wanted to mask an -EIO when later pages
laundered successfully.

> In this case, dio is carried out even if invalidate_complete_page2()
> fails on some pages.
> This can cause inconsistency between memory and blocks on HDD because
> the page
> cache still exists.

Yeah.

> I solved problems above by introducing invalidate_inode_pages3_range()
> and falling
> through to buffered I/O when invalidation of a page failed.

Well, I like the idea of more intelligently dealing with the known
problem between dio and ext3.  I'm not sure that falling back to
buffered is right.

If dio can tell that it only failed because jbd held the buffer, should
we have waited for the transaction to complete before trying to
invalidate again?

If we could do that, we'd avoid performing the IO twice.

> We can distinguish between failure of page invalidation and other errors
> with the return value of invalidate_inode_pages3_range().

I'm not sure duplicating the invalidation loop into a new function is
the right thing.  Maybe we'd just tweak inode_pages2 to indicate to the
caller the specific failing circumstances somehow.  Maybe.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ