linux-kernel - Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210202110126.GB18075@zn.tnic>
Date:   Tue, 2 Feb 2021 12:01:26 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     x86@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [PATCH v5] x86/mce: Avoid infinite loop for copy from user
 recovery

On Mon, Feb 01, 2021 at 10:58:12AM -0800, Luck, Tony wrote:
> On Thu, Jan 28, 2021 at 06:57:35PM +0100, Borislav Petkov wrote:
> > Crazy idea: if you still can reproduce on -rc3, you could bisect: i.e.,
> > if you apply the patch on -rc3 and it explodes and if you apply the same
> > patch on -rc5 and it works, then that could be a start... Yeah, don't
> > have a better idea here. :-\
> 
> I tried reporoducing (applied the original patch I posted back to -rc3) and
> the same issue stubbornly refused to show up again.
> 
> But I did hit something with the same signature (overflow bit set in
> bank 1) while running my futex test (which has two processes mapping
> the poison page).  This time I *do* understand what happened.  The test
> failed when the two processes were running on the two hyperhtreads of
> the same core. Seeing overflow in this case is understandable because
> bank 1 MSRs on my test machine are shared between the HT threads. When
> I run the test again using taskset(1) to only allowing running on
> thread 0 of each core, it keeps going for hunderds of iterations.
> 
> I'm not sure I can stitch together how this overflow also happened for
> my single process test. Maybe a migration from one HT thread to the
> other at an awkward moment?

Sounds plausible.

And the much more important question is, what is the code supposed to
do when that overflow *actually* happens in real life? Because IINM,
an overflow condition on the same page would mean killing the task to
contain the error and not killing the machine...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette