linux-kernel - Re: [mm 4.15-rc8] Random oopses under memory pressure.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFy43ypm0QvA5SqNR4O0ZJETbkR3NDR=dnSdvejc_nmSJQ@mail.gmail.com>
Date:   Thu, 18 Jan 2018 09:26:25 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Tony Luck <tony.luck@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        "hillf.zj" <hillf.zj@...baba-inc.com>,
        Hugh Dickins <hughd@...gle.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rik van Riel <riel@...hat.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        "the arch/x86 maintainers" <x86@...nel.org>
Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure.

On Thu, Jan 18, 2018 at 8:56 AM, Kirill A. Shutemov
<kirill@...temov.name> wrote:
>
> I can't say I fully grasp how 'diff' got this value and how it leads to both
> checks being false.

I think the problem is that page difference when they are in different sections.

When you do

     pte_page(*pvmw->pte) - pvmw->page

then the compiler takes the pointer difference, and then divides by
the size of "struct page" to get an index.

But - and this is important - it does so knowing that the division it
does will have no modulus: the two 'struct page *' pointers are really
in the same array, and they really are 'n*sizeof(struct page)' apart
for some 'n'.

That means that the compiler can optimize the division. In fact, for
this case, gcc will generate

        subl    %ebx, %eax
        sarl    $3, %eax
        imull   $-858993459, %eax, %eax

because 'struct page' is 40 bytes in size, and that magic sequence
happens to divide by 40 (first divide by 8, then that magical "imull"
will divide by 5 *IFF* the thing is evenly divisible by 5 (and not too
big - but the shift guarantees that).

Basically, it's a magic trick, because real divides are very
expensive, but you can fake them more quickly if you can limit the
input domain.

But what does it mean if the two "struct page *" are not in the same
array, and the two arrays were allocated not aligned exactly 40 bytes
away, but some random number of pages away?

You get *COMPLETE*GARBAGE* when you do the above optimized divide.
Suddenly the divide had a modulus (because the base of the two arrays
weren't 40-byte aligned), and the "trick" doesn't work.

So that's why you can't do pointer diffs between two arrays. Not
because you can't subtract the two pointers, but because the
*division* part of the C pointer diff rules leads to issues.

                Linus