netdev - Re: [BUG] from x86: Support kmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh2895wXEXYtb70CTgW+UR7jfh6VFhJB_bOrF0L7UKoEg@mail.gmail.com>
Date:   Wed, 6 Jan 2021 17:03:48 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Steven Rostedt <rostedt@...dmis.org>,
        Willem de Bruijn <willemb@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        David Miller <davem@...emloft.net>,
        Jonathan Lemon <jonathan.lemon@...il.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Christoph Hellwig <hch@....de>,
        Matthew Wilcox <willy@...radead.org>,
        Daniel Vetter <daniel@...ll.ch>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Netdev <netdev@...r.kernel.org>
Subject: Re: [BUG] from x86: Support kmap_local() forced debugging

On Wed, Jan 6, 2021 at 3:01 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>
> I triggered the following crash on x86_32 by simply doing a:
>
> (ssh'ing into the box)
>
>   # head -100 /tmp/output-file
>
> Where the /tmp/output-file was the output of a trace-cmd report.
> Even after rebooting and not running the tracing code, simply doing the
> head command still crashed.

The code decodes to

   0:   3b 5d e8                cmp    -0x18(%ebp),%ebx
   3:   0f 47 5d e8             cmova  -0x18(%ebp),%ebx
   7:   c7 45 e0 00 00 00 00    movl   $0x0,-0x20(%ebp)
   e:   8b 7d e0                mov    -0x20(%ebp),%edi
  11:   39 7d e8                cmp    %edi,-0x18(%ebp)
  14:   76 3a                   jbe    0x50
  16:   8b 45 d4                mov    -0x2c(%ebp),%eax
  19:   e8 a4 e4 ff ff          call   0xffffe4c2
  1e:   8b 55 e4                mov    -0x1c(%ebp),%edx
  21:   03 55 e0                add    -0x20(%ebp),%edx
  24:   89 d9                   mov    %ebx,%ecx
  26:   01 c6                   add    %eax,%esi
  28:   89 d7                   mov    %edx,%edi
  2a:*  f3 a4                   rep movsb %ds:(%esi),%es:(%edi)
 <-- trapping instruction
  2c:   e8 c9 e4 ff ff          call   0xffffe4fa
  31:   01 5d e0                add    %ebx,-0x20(%ebp)
  34:   8b 5d e8                mov    -0x18(%ebp),%ebx
  37:   b8 00 10 00 00          mov    $0x1000,%eax
  3c:   2b 5d e0                sub    -0x20(%ebp),%ebx

and while it would be good to see the output of
scripts/decode_stacktrace.sh, I strongly suspect that the above is

                                vaddr = kmap_atomic(p);
                                memcpy(to + copied, vaddr + p_off, p_len);
                                kunmap_atomic(vaddr);

(although I wonder how/why the heck you've enabled
CC_OPTIMIZE_FOR_SIZE=y, which is what causes "memcpy()" to be done as
that "rep movsb". I thought we disabled it because it's so bad on most
cpus).

So that first "call" instruction is the kmap_atomic(), the "rep movs"
is the memcpy(), and the "call" instruction immediately after is the
kunmap_atomic().

Anyway, you can see vaddr in register state:

        EAX: fff57000

so we've kmapped that one page at fff57000, but we're accessing past
it into the next page:

> BUG: unable to handle page fault for address: fff58000

with the current source address being (ESI: fff58000) and we still
have 248 bytes to go (ECX: 000000f8) even though we've already
overflowed into the next page.

You can see the original count still (EBX: 000005a8), so it really
looks like that skb_frag_foreach_page() logic

                        skb_frag_foreach_page(f,
                                              skb_frag_off(f) + offset - start,
                                              copy, p, p_off, p_len, copied) {
                                vaddr = kmap_atomic(p);
                                memcpy(to + copied, vaddr + p_off, p_len);
                                kunmap_atomic(vaddr);
                        }

must be wrong, and doesn't handle the "each page" part properly. It
must have started in the middle of the page, and p_len (that 0x5a8)
was wrong.

IOW, it really looks like p_off + p_len had the value 0x10f8, which is
larger than one page. And looking at the code, in
skb_frag_foreach_page(), I see:

             p_off = (f_off) & (PAGE_SIZE - 1),                         \
             p_len = skb_frag_must_loop(p) ?                            \
             min_t(u32, f_len, PAGE_SIZE - p_off) : f_len,              \

where that "min_t(u32, f_len, PAGE_SIZE - p_off)" looks correct, but
then presumably skb_frag_must_loop() must be wrong.

Oh, and when I look at that, I see

    static inline bool skb_frag_must_loop(struct page *p)
    {
    #if defined(CONFIG_HIGHMEM)
            if (PageHighMem(p))
                    return true;
    #endif
            return false;
    }

and that is no longer true. With the kmap debugging, even non-highmem
pages need that "do one page at a time" code, because even non-highmem
pages get remapped by kmap().

IOW, I think the patch to fix this might be something like the attached.

I wonder whether there is other code that "knows" about kmap() only
affecting PageHighmem() pages thing that is no longer true.

Looking at some other code, skb_gro_reset_offset() looks suspiciously
like it also thinks highmem pages are special.

Adding the networking people involved in this area to the cc too.

               Linus

Download attachment "patch" of type "application/octet-stream" (544 bytes)