linux-kernel - Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses poison

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 10 Nov 2015 13:55:46 -0800
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
	x86@...nel.org
Subject: Re: [RFC PATCH 0/3] Machine check recovery when kernel accesses
 poison

On Tue, Nov 10, 2015 at 12:21:01PM +0100, Borislav Petkov wrote:
> Just a general, why-do-we-do-this, question: on big systems, the memory
> occupied by the kernel is a very small percentage compared to whole RAM,
> right? And yet we want to recover from there too? Not, say, kexec...

I need to add more to the motivation part of this. The people who want
this are playing with NVDIMMs as storage. So think of many GBytes of
non-volatile memory on the source end of the memcpy(). People are used
to disk errors just giving them a -EIO error. They'll be unhappy if an
NVDIMM error crashes the machine.

> > Note that I also fudge the return value.  I'd like in the future
> > to be able to write a "mcsafe_copy_from_user()" function that
> > would be annotated both for page faults, to return a count of
> > bytes uncopied, or an indication that there was a machine check.
> > Hence the BIT(63) bit.  Internal feedback suggested we'd need
> > some IS_ERR() like macros to help users decode what happened
> > to take the right action.  But this is "RFC" to see if people
> > have better ideas on how to handle this.
> 
> Hmm, shouldn't this be using MF_ACTION_REQUIRED or even maybe a new MF_
> flag which is converted into a BUS_MCEERR_AR si_code and thus current
> gets a signal?
> 
> Only setting bit 63 looks a bit flaky to me...

It will be up to the caller to figure out what action to take. In
the NVDIMM filessytem scenario outlined above the result may be -EIO
for a data block ... something more drastic if we were reading metadata.

When I get around to writing mcsafe_copy_from_user() the code might
end up like:

some_syscall_e_g_write(void __user *buf, size_t cnt)
{
	u64 ret;

	ret = mcsafe_copy_from_user(kbuf, buf, cnt);

	if (ret & BIT(63)) {
		do some machine check thing ... e.g.
		send a SIGBUS to this process and return -EINTR
		This is where we use the address (after converting
		back to a user virtual address).
	} else if (ret) {
		user gave us a bad buffer: return -EFAULT
	} else {
		success!!!
	}
}

Which all looks quite ugly in long-hand ... I'm hoping that with
some pretty macros we can make it pretty.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/