linux-kernel - Re: [PATCH] [16/19] HWPOISON: Enable .remove_error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090812074611.GC28848@basil.fritz.box>
Date:	Wed, 12 Aug 2009 09:46:11 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>
Cc:	Andi Kleen <andi@...stfloor.org>, tytso@....edu, hch@...radead.org,
	mfasheh@...e.com, aia21@...tab.net, hugh.dickins@...cali.co.uk,
	swhiteho@...hat.com, akpm@...ux-foundation.org, npiggin@...e.de,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	fengguang.wu@...el.com,
	Satoshi OSHIMA <satoshi.oshima.fk@...achi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@...achi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for
	migration aware file systems

On Wed, Aug 12, 2009 at 11:49:56AM +0900, Hidehiro Kawai wrote:
> > I don't think there's much we can do if the application doesn't
> > check for IO errors properly. What would you do if it doesn't
> > check for IO errors at all? If it checks for IO errors it simply
> > has to check for them on all IO operations -- if they do 
> > they will detect hwpoison errors correctly too.
> 
> I believe it's not uncommon for applications to do buffered write
> and then exit without fsync().  And I think it's difficult to
> preclude such applications and commands from the system perfectly.

That's true, but for anything mission critical you would expect them
to use some transactional mechanism, either with O_SYNC or fsync().
Otherwise they always risk data loss anyways.

> > It's unclear to me this special mode is really desirable.
> > Does it bring enough value to the user to justify the complexity
> > of another exotic option?  The case is relatively exotic,
> > as in dirty write cache that is mapped to a file.
> > 
> > Try to explain it in documentation and you see how ridiculous it sounds; u
> > it simply doesn't have clean semantics
> > 
> > ("In case you have applications with broken error IO handling on
> > your mission critical system ...") 
> 
> Generally, dropping unwritten dirty page caches is considered to be
> risky.  So the "panic on IO error" policy has been used as usual
> practice for some systems.  I just suggested that we adopted
> this policy into machine check errors. 

Hmm, what we could possibly do -- as followon patches -- would be to
let error_remove_page check the per file system panic-on-io-error
super block setting for dirty pages and panic in this case too.  
Unfortunately this setting is currently per file system, not generic,
so it would need to be a fs specific check (or the flag would need
to be moved into a generic fs superblock field first)

I think that would be relatively clean semantics wise. Would you be 
interested in working on patches for that? 

> Another option is to introduce "ignore all" policy instead of
> panicking at the beginig of memory_failure().  Perhaps it finally
> causes SRAR machine check, and then kernel will panic or a process
> will be killed.  Anyway, this is a topic for the next stage.

The problem is memory_failure() would then need to start distingushing
between AR=1 and AR=0 which it doesn't today.

It could be done, but would need some more work. 

> > If you want to have improved IO error handling feel free to
> > submit it separately. I agree this area could use some work.
> > But it probably needs more design work first.
> 
> Well, this patch set itself looks good to me.
> I also looked into the other patches, I couldn't find any
> problems (although I'm not good judge of reviewing).
> 
> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>

Thanks for your review and your comments.

-Andi
-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/