lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 12 Aug 2009 09:46:11 +0200 From: Andi Kleen <andi@...stfloor.org> To: Hidehiro Kawai <hidehiro.kawai.ez@...achi.com> Cc: Andi Kleen <andi@...stfloor.org>, tytso@....edu, hch@...radead.org, mfasheh@...e.com, aia21@...tab.net, hugh.dickins@...cali.co.uk, swhiteho@...hat.com, akpm@...ux-foundation.org, npiggin@...e.de, linux-kernel@...r.kernel.org, linux-mm@...ck.org, fengguang.wu@...el.com, Satoshi OSHIMA <satoshi.oshima.fk@...achi.com>, Taketoshi Sakuraba <taketoshi.sakuraba.hc@...achi.com> Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for migration aware file systems On Wed, Aug 12, 2009 at 11:49:56AM +0900, Hidehiro Kawai wrote: > > I don't think there's much we can do if the application doesn't > > check for IO errors properly. What would you do if it doesn't > > check for IO errors at all? If it checks for IO errors it simply > > has to check for them on all IO operations -- if they do > > they will detect hwpoison errors correctly too. > > I believe it's not uncommon for applications to do buffered write > and then exit without fsync(). And I think it's difficult to > preclude such applications and commands from the system perfectly. That's true, but for anything mission critical you would expect them to use some transactional mechanism, either with O_SYNC or fsync(). Otherwise they always risk data loss anyways. > > It's unclear to me this special mode is really desirable. > > Does it bring enough value to the user to justify the complexity > > of another exotic option? The case is relatively exotic, > > as in dirty write cache that is mapped to a file. > > > > Try to explain it in documentation and you see how ridiculous it sounds; u > > it simply doesn't have clean semantics > > > > ("In case you have applications with broken error IO handling on > > your mission critical system ...") > > Generally, dropping unwritten dirty page caches is considered to be > risky. So the "panic on IO error" policy has been used as usual > practice for some systems. I just suggested that we adopted > this policy into machine check errors. Hmm, what we could possibly do -- as followon patches -- would be to let error_remove_page check the per file system panic-on-io-error super block setting for dirty pages and panic in this case too. Unfortunately this setting is currently per file system, not generic, so it would need to be a fs specific check (or the flag would need to be moved into a generic fs superblock field first) I think that would be relatively clean semantics wise. Would you be interested in working on patches for that? > Another option is to introduce "ignore all" policy instead of > panicking at the beginig of memory_failure(). Perhaps it finally > causes SRAR machine check, and then kernel will panic or a process > will be killed. Anyway, this is a topic for the next stage. The problem is memory_failure() would then need to start distingushing between AR=1 and AR=0 which it doesn't today. It could be done, but would need some more work. > > If you want to have improved IO error handling feel free to > > submit it separately. I agree this area could use some work. > > But it probably needs more design work first. > > Well, this patch set itself looks good to me. > I also looked into the other patches, I couldn't find any > problems (although I'm not good judge of reviewing). > > Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@...achi.com> Thanks for your review and your comments. -Andi -- ak@...ux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists