linux-kernel - Re: [RFC Patch 0/2] mm: Add parameters to make kernel behavior at memory error on dirty cache selectable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 12 Apr 2013 11:13:03 -0400
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Mitsuhiro Tanino <mitsuhiro.tanino.gm@...achi.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>
Subject: Re: [RFC Patch 0/2] mm: Add parameters to make kernel behavior at
 memory error on dirty cache selectable

On Fri, Apr 12, 2013 at 10:38:43PM +0900, Mitsuhiro Tanino wrote:
> (2013/04/12 3:10), Andi Kleen wrote:
> > On Thu, Apr 11, 2013 at 11:23:08AM -0400, Naoya Horiguchi wrote:
> >> On Thu, Apr 11, 2013 at 03:49:16PM +0200, Andi Kleen wrote:
> >>>> As a result, if the dirty cache includes user data, the data is lost,
> >>>> and data corruption occurs if an application uses old data.
> >>>
> >>> The application cannot use old data, the kernel code kills it if it
> >>> would do that. And if it's IO data there is an EIO triggered.
> >>>
> >>> iirc the only concern in the past was that the application may miss
> >>> the asynchronous EIO because it's cleared on any fd access. 
> >>>
> >>> This is a general problem not specific to memory error handling, 
> >>> as these asynchronous IO errors can happen due to other reason
> >>> (bad disk etc.) 
> >>>
> >>> If you're really concerned about this case I think the solution
> >>> is to make the EIO more sticky so that there is a higher chance
> >>> than it gets returned.  This will make your data much more safe,
> >>> as it will cover all kinds of IO errors, not just the obscure memory
> >>> errors.
> 
> I agree with Andi. We need to care both memory error and asynchronous
> I/O error.
> 
> >> I'm interested in this topic, and in previous discussion, what I was said
> >> is that we can't expect user applications to change their behaviors when
> >> they get EIO, so globally changing EIO's stickiness is not a great approach.
> > 
> > Not sure. Some of the current behavior may be dubious and it may 
> > be possible to change it. But would need more analysis.
> > 
> > I don't think we're concerned that much about "correct" applications,
> > but more ones that do not check everything. So returning more
> > errors should be safer.
> > 
> > For example you could have a sysctl that enables always stick
> > IO error -- that keeps erroring until it is closed.
> > 
> >> I'm working on a new pagecache tag based mechanism to solve this.
> >> But it needs time and more discussions.
> >> So I guess Tanino-san suggests giving up on dirty pagecache errors
> >> as a quick solution.
> > 
> > A quick solution would be enabling panic for any asynchronous IO error.
> > I don't think the memory error code is the right point to hook into.
> 
> Yes. I think both short term solution and long term solution is necessary
> in order to enable hwpoison feature for Linux as KVM hypervisor.
> 
> So my proposal is as follows,
>   For short term solution to care both memory error and I/O error:
>     - I will resend a panic knob to handle data lost related to dirty cache
>       which is caused by memory error and I/O error.

Sorry, I still think "panic on dirty pagecache error" is feasible in userspace.
This new knob will be completely useless after memory error reporting is
fixed in the future, so whenever possible I like the userspace solution
even for a short term one.

Thanks,
Naoya

>   For long term solution:
>     - Andi's proposal or Horiguchi-san's new pagecache tag based mechanism
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/