linux-kernel - BUG_ON() in workingset_node_shadows

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Oct 2016 21:00:55 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Antonio SJ Musumeci <trapexit@...wn.link>,
        Miklos Szeredi <miklos@...redi.hu>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        stable <stable@...r.kernel.org>
Subject: BUG_ON() in workingset_node_shadows_dec() triggers

I'm really sorry I applied that last series from Andrew just before
doing the 4.8 release, because they cause problems, and now it is in
4.8 (and that buggy crap is marked for stable too).

In particular, I just got this

    kernel BUG at ./include/linux/swap.h:276

and the end result was a dead kernel.

The bug that commit 22f2ac51b6d64 ("mm: workingset: fix crash in
shadow node shrinker caused by replace_page_cache_page()") purports to
have fixed has apparently been there since 3.15, but the fix is
clearly worse than the bug it tried to fix, since that original bug
has never killed my machine!

I should have reacted to the damn added BUG_ON() lines. I suspect I
will have to finally just remove the idiotic BUG_ON() concept once and
for all, because there is NO F*CKING EXCUSE to knowingly kill the
kernel.

Why the hell was that not a *warning*?

Yes, I'm grumpy. This went in very late in the release candidates, and
I had higher expectations of things coming in through Andrew. Adding
random BUG_ON()'s to code that clearly hasn't had sufficient testing
is *not* acceptable, and it's definitely not acceptable to send that
to me after rc8 unless it has gotten a *lot* of testing, which it
clearly must not have had. Adding stable to the cc too to warn about
this.

The full report is

  kernel BUG at ./include/linux/swap.h:276!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
   soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis
industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss
nfs_acl lockd grace sunrpc dm_crypt
  CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60ba944 #1
  Hardware name: System manufacturer System Product Name/Z170-K, BIOS
1803 05/06/2016
  task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
  RIP: page_cache_tree_insert+0xf1/0x100
  RSP: 0018:ffff8faa7f47bab0  EFLAGS: 00010046
  RAX: 0000000000000001 RBX: ffff8faadfaf8c18 RCX: ffff8fa8737b5488
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8fa8737b4b48
  RBP: ffff8faa7f47bae8 R08: 0000000000000012 R09: ffff8fa8737b54b0
  R10: 0000000000000040 R11: ffff8fa8737b54b0 R12: ffffea000b1ad580
  R13: 0000000000000000 R14: ffff8faa7f47bb48 R15: ffffea000b1ad580
  FS:  00007ffba3a61780(0000) GS:ffff8faaf6c00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007ffba31a5430 CR3: 00000002c6d40000 CR4: 00000000003406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    __add_to_page_cache_locked+0x12e/0x270
    add_to_page_cache_lru+0x4e/0xe0
    mpage_readpages+0x112/0x1d0
    blkdev_readpages+0x1d/0x20
    __do_page_cache_readahead+0x1ad/0x290
    force_page_cache_readahead+0xaa/0x100
    page_cache_sync_readahead+0x3f/0x50
    generic_file_read_iter+0x5af/0x740
    blkdev_read_iter+0x35/0x40
    __vfs_read+0xe1/0x130
    vfs_read+0x96/0x130
    SyS_read+0x55/0xc0
    entry_SYSCALL_64_fastpath+0x13/0x8f
  Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48
83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f>
0b e8 88 68 ef ff 0f 1f 84 00
  RIP  page_cache_tree_insert+0xf1/0x100

and I hope somebody can see what is going wrong in there. The reason
the machine *dies* from that thing is that we end up then immediately
having a

  BUG: unable to handle kernel paging request at ffffffffb70bdaa8
  IP: blk_flush_plug_list+0x8b/0x250
  Call Trace:
    schedule+0x61/0x80
    do_exit+0x8c8/0xae0
    rewind_stack_do_exit+0x17/0x20

and then a

  Fixing recursive fault but reboot is needed!

and the machine will never recover.

People who add random assert statements that kill machines should damn
well not be let near the VM layer.

Johannes? Please make this your first priority. And in the meantime I
will make that VM_BUG_ON() be a VM_WARN_ON_ONCE().

And dammit, if anybody else feels that they had done "debugging
messages with BUG_ON()", I would suggest you

 (a) rethink your approach to programming

 (b) send me patches to remove the crap entirely, or make them real
*DEBUGGING* messages, not "kill the whole machine" messages.

I've ranted against people using BUG_ON() for debugging in the past.
Why the f*ck does this still happen? And Andrew - please stop taking
those kinds of patches! Lookie here:

    https://lwn.net/Articles/13183/

so excuse me for being upset that people still do this shit almost 15
years later.

                 Linus