lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Apr 2022 12:57:19 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "ast@...nel.org" <ast@...nel.org>, "bp@...en8.de" <bp@...en8.de>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "dborkman@...hat.com" <dborkman@...hat.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "hch@...radead.org" <hch@...radead.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "imbrenda@...ux.ibm.com" <imbrenda@...ux.ibm.com>,
        Kernel Team <Kernel-team@...com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "mbenes@...e.cz" <mbenes@...e.cz>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "pmladek@...e.com" <pmladek@...e.com>,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        Mike Rapoport <rppt@...nel.org>,
        "song@...nel.org" <song@...nel.org>,
        Song Liu <songliubraving@...com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

Excerpts from Linus Torvalds's message of April 22, 2022 12:31 pm:
> On Thu, Apr 21, 2022 at 6:51 PM Nicholas Piggin <npiggin@...il.com> wrote:
>>
>> > See
>> >
>> >     https://lore.kernel.org/all/20220415164413.2727220-3-song@kernel.org/
>> >
>> > for [PATCH 2/4] in the series for this particular issue.
>>
>> I was being facetious. The problem is you can't do ^ because x86 is
>> buggy.
> 
> No, we probably *can* do that PATCH 2/4. I suspect x86 really isn't
> that buggy. The bugs are elsewhere (including other vmalloc_huge()
> uses).

I thought Rick seemed to be concerned about other issues but if
not then great, I would happily admit to being mistaken.

> 
> Really. Why can't you just admit that the major bug was in the
> hugepage code itself?

Uh, I did. But not the concept. You admit you were going stupid
and freaking about about vmalloc_to_page which is really not
some insurmountable problem.

> 
> You claim:
> 
>> Because it can be transparent. The bug was (stupidly) using compound
>> pages when it should have just used split higher order pages.
> 
> but we're in -rc3 for 5.18, and you seem to be entirely ignoring the
> fact that that stupid bug has been around for a *YEAR* now.

I'm not ignoring that at all. Again I completely agree that path is
unlikely to have been tested (or at least not reported) on
powerpc or s390.

> Guess what? It was reported within *days* of the code having  been
> enabled on x86.

Is that supposed to be a gotcha? If I was cc'ed on it I might have
fixed it a month ago, but why would it be unusual for major new
coverage of such a big feature to expose a bug?

> But for about a year, youv'e been convinced that powerpc is fine,
> because nobody ever reported it.
> 
> And you *still* try to make this about how it's some "x86 bug",
> despite that bug not having been x86-specific AT ALL.

I think we're talking past one another. I never said that bug
was x86 specific, it was only just brought to my attention now
and I (presume that I) fixed it within 10 minutes of looking at
it.

That's not what I understood to be the only x86 problem though,
but again as I said I'll admit I misunderstood that if it's not
the case.

[snip]

> For example, your "two-liner fix" is not at all obvious.

It is pretty obvious. Something wanted to use a tail page and
ran afoul of the compound page thing. Splitting the page just
gives us an array of conguous small pages.

> That broken code case used to have a comment that remap_vmalloc_page()
> required compound pages, and you just removed that whole thing as if
> it didn't matter, and split the page.

The new comment is a superset, I agree the old comment is
under-done. remap_vmalloc_page() wanted to get_page the sub
pages and without compound or split huge page they have a ref
count of 0 so that blows up.

split page just makes the pages behave exactly as they would
if allocated individually (i.e., the small page case).

Yes a stupid thinko on my part when I added that, but that's
all it is, it's an obvious thing.

> (I also think the comment meant 'vmap_pages_range()', but whatever).

remap_vmalloc_range I think. But clearly the same issue applies
to any caller which may use the struct pages for stuff.

> And the thing is, I'm not entirely convinced that comment was wrong
> and could just be ignored. The freeing code in __vunmap() will do
> 
>                 int i, step = 1U << page_order;
> 
>                 for (i = 0; i < area->nr_pages; i += step) {
>                         struct page *page = area->pages[i];
> 
>                         BUG_ON(!page);
>                         mod_memcg_page_state(page, MEMCG_VMALLOC, -step);
>                         __free_pages(page, page_order);
> 
> which now looks VERY VERY wrong.
> 
> You've split the pages, they may be used as individual pages (possibly
> by other things), and then you now at freeing time treat them as a
> single compound page after all..

Oh yeah I'll take a look at the freeing side too, as I said
totally untested but that's obviously the issue.

> 
> So your "trivial two-liner" that tried to fix a bug that has been
> there for a year now, itself seems quite questionable.
> 
> Maybe it works, maybe it doesn't. My bet is "it doesn't".
> 
> And guess what? I bet it worked just fine in your testing on powerpc,
> because you probably didn't actually have any real huge-page vmalloc
> cases except for those filesystem big-hash cases that never get
> free'd.
> 
> So that "this code was completely buggy for a year on powerpc" never
> seemed to teach you anything about the code.
> 
> And again - none of this is at all x86-specific. NOT AT ALL.
> 
> So how about you admit you were wrong to begin with.
> 
> That hugepage code needs more care before we re-enable it.
> 
> Your two-liner wasn't so obvious after all, was it?
> 
> I really think we're much safer saying "hugepage mappings only matter
> for a couple of things, and those things will *not* do sub-page games,
> so it's simple and safe".
> 
> .. and that requires that opt-in model.
> 
> Because your "it's transparent" argument has never ever actually been
> true, now has it?

Of course it has. You admit you didn't understand the design and
freaked out about vmalloc_to_page when drivers playing games with
struct page is a non issue.

I'm not a moron, clearly if it wasn't intended to be transparent to
callers it could not have just been enabled unconditionally. What's
so difficult to you about the concept that the arch code is really
the only thing that has to know about its page table arrangement?

You're now fixating on this bug as though that invalidates the whole
concept of huge vmalloc being transparent to callers. That's being
disingenous though, it's just a bug. That's all it is. You merge
fixes for hundreds of bugs everywhere in the tree every day. You're
using it as a crutch because you can't admit you were wrong about it
being transparent to drivers and thinking it has to be opt-in.

Anyway blah whatever. I admit to everything, I did it, I'm bad I did
a huge booboo a year ago, huge vmalloc is stupid, x86 is wonderful,
everything else is garbage, Linus is great. Okay?

I'll polish up the patch (*and* test it with a specifically written
test case just for your), and then we can see if it solves that drm
bug and we can go from there.

Thanks,
Nick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ