linux-kernel - Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <80f25292-585c-7729-2a23-7c46b3309a1a@shipmail.org>
Date:   Thu, 10 Oct 2019 00:30:49 +0200
From:   Thomas Hellström (VMware) 
        <thomas_os@...pmail.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Thomas Hellstrom <thellstrom@...are.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Matthew Wilcox <willy@...radead.org>,
        Will Deacon <will.deacon@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rik van Riel <riel@...riel.com>,
        Minchan Kim <minchan@...nel.org>,
        Michal Hocko <mhocko@...e.com>,
        Huang Ying <ying.huang@...el.com>,
        Jérôme Glisse <jglisse@...hat.com>
Subject: Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a
 pmd_entry is present

On 10/9/19 10:20 PM, Linus Torvalds wrote:
> On Wed, Oct 9, 2019 at 1:06 PM Thomas Hellström (VMware)
> <thomas_os@...pmail.org> wrote:
>> On 10/9/19 9:20 PM, Linus Torvalds wrote:
>>> Don't you get it? There *is* no PTE level if you didn't split.
>> Hmm, This paragraph makes me think we have very different perceptions about what I'm trying to achieve.
> It's not about what you're trying to achieve.
>
> It's about the actual code.
>
> You cannot do that
>
>> -               split_huge_pmd(walk->vma, pmd, addr);
>> +               if (!ops->pmd_entry)
>> +                       split_huge_pmd(walk->vma, pmd, addr);
> it's insane.
>
> You *have* to call split_huge_pmd() if you're doing to call the
> pte_entry() function.
>
> I don't understand why you are arguing. This is not about "feelings"
> and "intentions" or about "trying to achieve".
>
> This is about cold hard "you can't do that", and this is now the third
> time I tell you _why_ you can't do that: you can't walk the last level
> if you don't _have_ a last level. You have to split the pmd to do so.
It's not so much arguing but rather trying to understand your concerns 
and your perception of what the final code should look like.
>
> End of story.

So is it that you want pte_entry() to be strictly called for *each* 
virtual address, even if we have a pmd_entry()?
In that case I completely follow your arguments, meaning we skip this 
patch completely?

My take on the change was that pmd_entry() returning 0 would mean we 
could actually skip the pte level completely and nothing would otherwise 
pass down to the next level for which split_huge_pmd() wasn't a NOP, 
similar to how pud_entry does things. FWIW, see

https://lore.kernel.org/lkml/20191004123732.xpr3vroee5mhg2zt@box.shutemov.name/

and we could in the long run transform the pte walk in many pmd_entry 
callbacks into pte_entry callbacks.


>
>> I wanted the pte level to *only* get called for *pre-existing* pte entries.
> Again, I told you what the solution was.
>
> But the fact is, it's not what your other code even wants or does.
>
> Seriously. You have two cases you care about in your callbacks
>
>   - an actual hugepmd. This is an ERROR for you and you do a huge
> WARN_ON() for it to let people know.
No, it's typically a NOP, since the hugepmd should be read-only. 
Write-enabled huge pages are split in fault().
>
>   - regular pages. This is what your other code actually handles.
>
> So for the hugepomd case, you have two choices:
>
>   - handle it by splitting and deal with the regular pages: "return 0;"
Well, this is not what we want to do, and the reason we have the patch 
in the first place.
>
>   - actually error out: "return -EINVAL".

/Thomas