lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171030110511.scfrdtlnf5lbdhu5@pd.tnic>
Date:   Mon, 30 Oct 2017 12:05:27 +0100
From:   Borislav Petkov <bp@...e.de>
To:     Fengguang Wu <fengguang.wu@...el.com>,
        Tyler Baicar <tbaicar@...eaurora.org>
Cc:     Huang Ying <ying.huang@...el.com>,
        Chen Gong <gong.chen@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Will Deacon <will.deacon@....com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>
Subject: Re: [ghes_copy_tofrom_phys] BUG: sleeping function called from
 invalid context at mm/page_alloc.c:4150

On Mon, Oct 30, 2017 at 12:18:35AM +0100, Fengguang Wu wrote:
> CC related developers for the BUG in v4.14-rc6.
> 
> On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
> > Hi Linus,
> > 
> > Up to now we see the below boot error/warnings when testing v4.14-rc6.
> > 
> > They hit the RC release mainly due to various imperfections in 0day's
> > auto bisection. So I manually list them here and CC the likely easy to
> > debug ones to the corresponding maintainers in the followup emails.
> > 
> > boot_successes: 4700
> > boot_failures: 247
> > 
> > BUG:kernel_hang_in_test_stage: 152
> > BUG:kernel_reboot-without-warning_in_test_stage: 10
> > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c: 1
> > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c: 3
> > BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c: 21
> 
> Here is the dmesg fragment:
> 
> [   47.597981] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x26d34d96462, max_idle_ns: 440795289520 ns
> [   48.626601] clocksource: Switched to clocksource tsc
> [   49.273620] ERST: Error Record Serialization Table (ERST) support is initialized.
> [   49.290288] pstore: using zlib compression
> [   49.299588] pstore: Registered erst as persistent store backend
> [   49.311408] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150
> [   49.312031] in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0
> [   49.312031] CPU: 37 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc6 #1
> [   49.312031] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [   49.312031] Call Trace:
> [   49.312031]  dump_stack+0x63/0x86
> [   49.312031]  ___might_sleep+0xf1/0x110
> [   49.312031]  __might_sleep+0x4a/0x80
> [   49.312031]  __alloc_pages_nodemask+0x14e/0x270
> [   49.312031]  alloc_page_interleave+0x17/0x80
> [   49.312031]  alloc_pages_current+0xc8/0xe0
> [   49.312031]  __get_free_pages+0xe/0x40
> [   49.312031]  pte_alloc_one_kernel+0x15/0x20
> [   49.312031]  __pte_alloc_kernel+0x1d/0x100
> [   49.312031]  ioremap_page_range+0x330/0x3a0
> [   49.312031]  ghes_copy_tofrom_phys+0x182/0x2b0
> [   49.312031]  ghes_read_estatus+0x76/0x140
> [   49.312031]  ghes_proc+0x1c/0x130
> [   49.312031]  ghes_probe+0x157/0x430
> [   49.312031]  platform_drv_probe+0x3b/0xa0
> [   49.312031]  driver_probe_device+0x29c/0x450
> [   49.312031]  __driver_attach+0xdf/0xf0
> [   49.312031]  ? driver_probe_device+0x450/0x450
> [   49.312031]  bus_for_each_dev+0x60/0xa0
> [   49.312031]  driver_attach+0x1e/0x20
> [   49.312031]  bus_add_driver+0x170/0x260
> [   49.312031]  ? set_debug_rodata+0x17/0x17
> [   49.312031]  driver_register+0x60/0xe0
> [   49.312031]  __platform_driver_register+0x36/0x40
> [   49.312031]  ghes_init+0x10f/0x199
> [   49.312031]  ? bert_init+0x215/0x215
> [   49.312031]  do_one_initcall+0x43/0x170
> [   49.312031]  ? set_debug_rodata+0x17/0x17
> [   49.312031]  kernel_init_freeable+0x198/0x220
> [   49.312031]  ? rest_init+0xd0/0xd0
> [   49.312031]  kernel_init+0xe/0x101
> [   49.312031]  ret_from_fork+0x25/0x30
> [   49.670116] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
> [   49.691436] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [   49.729954] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> [   49.767235] Non-volatile memory driver v1.3
> [   49.778363] Linux agpgart interface v0.103

Looks like Tyler broke it:

77b246b32b2c ("acpi: apei: check for pending errors when probing GHES entries")

and it went into 4.13 and -stable.

Tyler, why is it so important to do the polling immediately upon
registration? Can't we wait until the polling does it?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ