lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171030112752.c4n4m4vhh2barjew@node.shutemov.name>
Date:   Mon, 30 Oct 2017 14:27:52 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Fengguang Wu <fengguang.wu@...el.com>
Cc:     Linux Memory Management List <linux-mm@...ck.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vineet Gupta <Vineet.Gupta1@...opsys.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Geliang Tang <geliangtang@....com>
Subject: Re: [pgtable_trans_huge_withdraw] BUG: unable to handle kernel NULL
 pointer dereference at 0000000000000020

On Mon, Oct 30, 2017 at 10:28:42AM +0100, Fengguang Wu wrote:
> Hi Kirill,
> 
> On Mon, Oct 30, 2017 at 12:19:40PM +0300, Kirill A. Shutemov wrote:
> > On Mon, Oct 30, 2017 at 12:37:01AM +0100, Fengguang Wu wrote:
> > > CC MM people.
> > > 
> > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > > > Hi Linus,
> > > >
> > > > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > > >
> > > > They hit the RC release mainly due to various imperfections in 0day's
> > > > auto bisection. So I manually list them here and CC the likely easy to
> > > > debug ones to the corresponding maintainers in the followup emails.
> > > >
> > > > boot_successes: 4700
> > > > boot_failures: 247
> > > >
> > > > BUG:kernel_hang_in_test_stage: 152
> > > > BUG:kernel_reboot-without-warning_in_test_stage: 10
> > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c: 1
> > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c: 3
> > > > BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c: 21
> > > > BUG:soft_lockup-CPU##stuck_for#s: 1
> > > > BUG:unable_to_handle_kernel: 13
> > > 
> > > Here is the call trace:
> > > 
> > > [  956.669197] [  956.670421] stress-ng: fail:  [27945] stress-ng-numa:
> > > get_mempolicy: errno=22 (Invalid argument)
> > 
> > Can you also share how you run stress-ng? Is it reproducible?
> 
> The command line is
> 
>        stress-ng --class cpu --sequential $(nproc) --timeout 1 --times --verify --metrics-brief
> 
> The test box is
> 
>        model: Broadwell-EP
>        nr_cpu: 88
>        memory: 128G

By chance, do you emulated nvdimm there? I suspect DAX stuff.
Do you have full dmesg around?

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ