[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu_8ibO4D01DZv6KjL2GnvKuVBVnt=doxkN0w=4utJ7NvQ@mail.gmail.com>
Date: Mon, 17 Jun 2019 13:33:19 +0200
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Kees Cook <keescook@...omium.org>
Cc: Will Deacon <will.deacon@....com>,
Jayachandran Chandrasekharan Nair <jnair@...vell.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
Jan Glauber <jglauber@...vell.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC] Disable lockref on arm64
On Sun, 16 Jun 2019 at 23:31, Kees Cook <keescook@...omium.org> wrote:
>
> On Sat, Jun 15, 2019 at 04:18:21PM +0200, Ard Biesheuvel wrote:
> > Yes, I am using the same saturation point as x86. In this example, I
> > am not entirely sure I understand why it matters, though: the atomics
> > guarantee that the write by CPU2 fails if CPU1 changed the value in
> > the mean time, regardless of which value it wrote.
> >
> > I think the concern is more related to the likelihood of another CPU
> > doing something nasty between the moment that the refcount overflows
> > and the moment that the handler pins it at INT_MIN/2, e.g.,
> >
> > > CPU 1 CPU 2
> > > inc()
> > > load INT_MAX
> > > about to overflow?
> > > yes
> > >
> > > set to 0
> > > <insert exploit here>
> > > set to INT_MIN/2
>
> Ah, gotcha, but the "set to 0" is really "set to INT_MAX+1" (not zero)
> if you're using the same saturation.
>
Of course. So there is no issue here: whatever manipulations are
racing with the overflow handler can never result in the counter to
unsaturate.
And actually, moving the checks before the stores is not as trivial as
I thought, E.g., for the LSE refcount_add case, we have
" ldadd %w[i], w30, %[cval]\n" \
" adds %w[i], %w[i], w30\n" \
REFCOUNT_PRE_CHECK_ ## pre (w30)) \
REFCOUNT_POST_CHECK_ ## post \
and changing this into load/test/store defeats the purpose of using
the LSE atomics in the first place.
On my single core TX2, the comparative performance is as follows
Baseline: REFCOUNT_TIMING test using REFCOUNT_FULL (LSE cmpxchg)
191057942484 cycles # 2.207 GHz
148447589402 instructions # 0.78 insn per
cycle
86.568269904 seconds time elapsed
Upper bound: ATOMIC_TIMING
116252672661 cycles # 2.207 GHz
28089216452 instructions # 0.24 insn per
cycle
52.689793525 seconds time elapsed
REFCOUNT_TIMING test using LSE atomics
127060259162 cycles # 2.207 GHz
0 instructions # 0.00 insn per
cycle
57.243690077 seconds time elapsed
Powered by blists - more mailing lists