lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240724113340.GA27474@willie-the-truck>
Date: Wed, 24 Jul 2024 12:33:41 +0100
From: Will Deacon <will@...nel.org>
To: Ard Biesheuvel <ardb@...nel.org>
Cc: Asahi Lina <lina@...hilina.net>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, asahi@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org,
	Catalin Marinas <catalin.marinas@....com>, ryan.roberts@....com,
	mark.rutland@....com
Subject: Re: LPA2 on non-LPA2 hardware broken with 16K pages

On Tue, Jul 23, 2024 at 06:28:16PM +0200, Ard Biesheuvel wrote:
> On Tue, 23 Jul 2024 at 18:05, Will Deacon <will@...nel.org> wrote:
> >
> > On Tue, Jul 23, 2024 at 05:02:15PM +0200, Ard Biesheuvel wrote:
> > > On Tue, 23 Jul 2024 at 16:52, Will Deacon <will@...nel.org> wrote:
> > > > On Fri, Jul 19, 2024 at 11:02:29AM -0700, Ard Biesheuvel wrote:
> ...
> > > > >
> > > > > We might add
> > > > >
> > > > > if (pgtable_l4_enabled())
> > > > >     pgdp = &pgd;
> > > > >
> > > > > here to preserve the existing 'lockless' behavior when PUDs are not
> > > > > folded.
> > > >
> > > > The code still needs to be 'lockless' for the 5-level case, so I don't
> > > > think this is necessary.
> > >
> > > The 5-level case is never handled here.
> >
> > Urgh, yes, sorry. I've done a fantasticly bad job of explaining myself.
> >
> > > There is the 3-level case, where the runtime PUD folding needs the
> > > actual address in order to recalculate the descriptor address using
> > > the correct shift. In this case, we don't dereference the pointer
> > > anyway so the 'lockless' thing doesn't matter (afaict)
> > >
> > > In the 4-level case, we want to preserve the original behavior, where
> > > pgd is not reloaded from pgdp. Setting pgdp to &pgd achieves that.
> >
> > Right. What I'm trying to get at is the case where we have folding. For
> > example, with my patch applied, if we have 3 levels then the lockless
> > GUP walk looks like:
> >
> >
> > pgd_t pgd = READ_ONCE(*pgdp);
> >
> > p4dp = p4d_offset_lockless(pgdp, pgd, addr);
> >         => Returns pgdp
> > p4d_t p4d = READ_ONCE(*p4dp);
> >
> > pudp = pud_offset_lockless(p4dp, p4d, addr);
> >         => Returns &p4d, which is again the pgdp
> > pud_t pud = READ_ONCE(*pudp);
> >
> >
> > So here we're reloading the same pointer multiple times and my argument
> > is that if we need to add logic to avoid this for the
> > pgtable_l4_enabled() case, then we have bigger problems.
> >
> 
> The 3-level case is not relevant here. My suggestion only affects the
> 4-level case:

That's exactly what I'm uneasy about :/

> if (pgtable_l4_enabled())
>    pgdp = &pgd;
> 
> which prevents us from evaluating *pgdp twice, which seems to me to be
> the reason these routines exist in the first place. Given that the
> 3-level runtime-folded case is the one we are trying to fix here, I'd
> argue that keeping the 4-level case the same as before is important.

I think consistency between 4-level and 3-level is far more important.
Adding code to avoid reloading the entry for one specific case (the
pgtable_l4_enabled() case), whilst requiring other cases (e.g. the
3-level compile-time folded case) to reload from the pointer is
inconsistent. Either they both need it or neither of them need it, no?

> > > > Yes, we'll load the same entry multiple times,
> > > > but it should be fine because they're in the context of a different
> > > > (albeit folded) level.
> > > >
> > >
> > > I don't understand what you are saying here. Why is that fine?
> >
> > I think it's fine because (a) the CPU guarantees same address
> > read-after-read ordering and (b) We only evaluate the most recently read
> > value. It would be a problem if we mixed data from different reads but,
> > because the use is confined to that 'level', we don't end up doing that.
> >
> > Dunno, am I making any sense?
> >
> 
> So what is the point of p?d_offset_lockless()? Is it a performance
> optimization that we don't care about on arm64? Or does this reasoning
> still only apply to the folded case?

Well, naturally it's all undocumented. S390 added the macros, but it
looks like that was because they hit a similar problem to the one
reported by Lina (see d3f7b1bb2040 ("mm/gup: fix gup_fast with dynamic
page table folding")) and, really, that change is just moving the logic
out of the fast GUP code and into some new macros.

If you think about trying to do fast GUP using the regular pXd_offset()
macros with 5 levels, the problem is a little more obvious. For example,
it would look something like:

	// gup_fast_pgd_range()
	pgd_t pgd = READ_ONCE(*pgdp);

	if (pgd_none(pgd))
		return;

	...

	// gup_fast_p4d_range()
	p4dp = p4d_offset(pgdp, addr);
		=> READ_ONCE(*pgdp);

A concurrent writer could e.g. clear *pgdp between the two reads and
we'd end up with junk in p4dp because of a ToCToU-type bug.

But I don't think that applies to the case we're discussing, because the
reload of the entry happens in the following READ_ONCE() rather than
in the _offset macro:

	p4d_t p4d = READ_ONCE(*p4dp);

and, as this is in the context of a new level, everything is then
rechecked so that a concurrent modification won't cause us to crash.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ