[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210425014816.GB5251@xsang-OptiPlex-9020>
Date:   Sun, 25 Apr 2021 09:48:16 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Harish Sriram <harish@...ux.ibm.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>
Subject: Re: [mm/vunmap] e47110e905: WARNING:at_mm/vmalloc.c:#__vunmap
Hi Linus,
On Fri, Apr 23, 2021 at 10:18:18AM -0700, Linus Torvalds wrote:
> On Thu, Apr 22, 2021 at 11:15 PM kernel test robot
> <oliver.sang@...el.com> wrote:
> >
> > commit: e47110e90584a22e9980510b00d0dfad3a83354e ("mm/vunmap: add cond_resched() in vunmap_pmd_range")
> 
> Funky. That commit doesn't seem to have anything to do with the oops.
> 
> The oops is odd too:
> 
> > [  198.731223] WARNING: CPU: 0 PID: 1948 at mm/vmalloc.c:2247 __vunmap (kbuild/src/consumer/mm/vmalloc.c:2247 (discriminator 1))
> 
> That's the warning for an unaligned vunmap():
> 
>   2247          if (WARN(!PAGE_ALIGNED(addr), "Trying to vfree() bad
> address (%p)\n",
>   2248                          addr))
>   2249                  return;
> 
> > [  198.744933] Call Trace:
> > [  198.745229] free_module (kbuild/src/consumer/kernel/module.c:2251)
> 
>   2248          /* This may be empty, but that's OK */
>   2249          module_arch_freeing_init(mod);
>   2250          module_memfree(mod->init_layout.base);
>   2251          kfree(mod->args);
> 
> That's the "module_memfree()" - the return address points to the
> return point, which is the next line.
> 
> And as far as I can tell, the only thing that assigns anything but
> NULL to that init_layout.base is
> 
>                 ptr = module_alloc(mod->init_layout.size);
> 
> which uses __vmalloc_node_range() for the allocation.
> 
> So absolutely nothing in this report makes sense to me. I suspect it's
> some odd memory corruption.
> 
> Oliver - how reliable is that bisection?
we will check further if any issue in our test env.
by bot auto tests, we saw 12 issue instances out of 74 runs. but not happen
out of 100 runs of parent.
f3f99d63a8156c7a e47110e90584a22e9980510b00d
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
          1:100         -1%            :74    dmesg.BUG:kernel_reboot-without-warning_in_test_stage
          2:100          0%           2:74    dmesg.BUG:unable_to_handle_page_fault_for_address
           :100         12%          12:74    dmesg.Kernel_panic-not_syncing:Fatal_exception
          2:100          0%           2:74    dmesg.Oops:#[##]
          1:100         -1%            :74    dmesg.RIP:__is_module_percpu_address
           :100         12%          12:74    dmesg.RIP:__vunmap  <-----
           :100         12%          12:74    dmesg.RIP:kfree
           :100          1%           1:74    dmesg.RIP:kobject_add_internal
          2:100         -1%           1:74    dmesg.RIP:print_modules
          1:100         -1%            :74    dmesg.RIP:skip_spaces
          1:100         -1%            :74    dmesg.RIP:usercopy_abort
           :100          1%           1:74    dmesg.WARNING:at_lib/kobject.c:#kobject_add_internal
           :100         12%          12:74    dmesg.WARNING:at_mm/vmalloc.c:#__vunmap
          3:100         10%          13:74    dmesg.boot_failures
          1:100         -1%            :74    dmesg.canonical_address#:#[##]
          2:100         -2%            :74    dmesg.invalid_opcode:#[##]
          2:100         -2%            :74    dmesg.kernel_BUG_at_mm/usercopy.c
           :100         11%          11:74    dmesg.stack_segment:#[##]
> 
> Does anybody else see what might be up?
> 
>             Linus
Powered by blists - more mailing lists