linux-kernel - Re: [3.10] Oopses in kmem_cache_allocate() via prepare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEcaRjip29P8TdMHF8zJ3zzsQj7o_B9chEz7Q6xH0Bqtw3A-jg@mail.gmail.com>
Date:	Tue, 20 Aug 2013 00:17:52 -0700
From:	Ian Applegate <ia@...udflare.com>
To:	Al Viro <viro@...iv.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Christoph Lameter <cl@...two.org>,
	Simon Kirby <sim@...tway.ca>,
	Pekka Enberg <penberg@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Chris Mason <chris.mason@...ionio.com>
Subject: Re: [3.10] Oopses in kmem_cache_allocate() via prepare_creds()

We are also seeing this or a similar issue. On a fairly widespread
deployment of 3.10.1 & 3.10.6 this occurred fairly consistently on the
order of 36 days (combined MTBF.)

[28974.739774] ------------[ cut here ]------------
[28974.744980] kernel BUG at mm/slub.c:3352!
[28974.749502] invalid opcode: 0000 [#1] SMP
[28974.754143] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler
dm_mod md_mod nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter
ip6table_raw ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
iptable_filter xt_tcpudp xt_CT nf_conntrack xt_multiport iptable_raw
ip_tables x_tables rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 fuse
nfsv3 nfs_acl nfs fscache lockd sunrpc bonding tcp_cubic sg sfc(O) mtd
mdio igb dca i2c_algo_bit i2c_core ptp pps_core coretemp kvm_intel kvm
crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper
aes_x86_64 acpi_cpufreq evdev sd_mod crc_t10dif snd_pcm tpm_tis
microcode tpm snd_timer tpm_bios snd soundcore snd_page_alloc pcspkr
ahci libahci uhci_hcd ehci_pci ehci_hcd lpc_ich libata mfd_core
usbcore usb_common hpsa scsi_mod mperf button processor thermal_sys
[28974.835407] CPU: 17 PID: 21400 Comm: nginx-fl Tainted: G
O 3.10.1-cloudflare-trace+ #2
[28974.845307] Hardware name: HP ProLiant DL180 G6  , BIOS O20 01/24/2011
[28974.852653] task: ffff8805bea7b390 ti: ffff8809a5f0e000 task.ti:
ffff8809a5f0e000
[28974.861095] RIP: 0010:[<ffffffff810ebcb2>]  [<ffffffff810ebcb2>]
kfree+0x59/0xe7
[28974.869453] RSP: 0018:ffff8809a5f0fe30  EFLAGS: 00010246
[28974.875434] RAX: 0000000000000000 RBX: ffff880b21e79d40 RCX: 0000000000000028
[28974.883458] RDX: 006ffc0000080068 RSI: 000000000001183c RDI: ffffea002c879e40
[28974.891483] RBP: ffffea002c879e40 R08: 0000000000017100 R09: ffffea0009ea0480
[28974.899507] R10: ffff880ae9746250 R11: 0000000000000000 R12: ffffffff81119456
[28974.907533] R13: 000000000000000a R14: ffff8809a5f0ff48 R15: 0000000000013b80
[28974.915558] FS:  00007fb36c115710(0000) GS:ffff880627d60000(0000)
knlGS:0000000000000000
[28974.924681] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[28974.931149] CR2: ffffffffff600400 CR3: 0000000761b93000 CR4: 00000000000007e0
[28974.939173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[28974.947190] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[28974.955205] Stack:
[28974.957486]  ffff880a3e72a700 000000000001183c 0000000000000000
ffffffff81119456
[28974.965873]  ffff8809a5f0fe48 0000000000000000 ffff880b21e79d40
000000000000c350
[28974.974253]  ffff8809a5f0fec0 ffff8802b4781bc0 0000000000000200
ffffffff811d96eb
[28974.982646] Call Trace:
[28974.985419]  [<ffffffff81119456>] ? do_readv_writev+0xfc/0x135
[28974.991987]  [<ffffffff811d96eb>] ? ep_poll+0x215/0x286
[28974.997875]  [<ffffffff8112ccb4>] ? fget_light+0x2e/0x7c
[28975.003861]  [<ffffffff811182fd>] ? fdget+0x16/0x1c
[28975.009359]  [<ffffffff8111956c>] ? SyS_readv+0x5a/0xb0
[28975.015245]  [<ffffffff811da1ae>] ? SyS_epoll_wait+0x86/0xc1
[28975.021622]  [<ffffffff814e4d92>] ? system_call_fastpath+0x16/0x1b
[28975.028578] Code: 48 83 fb 10 0f 86 aa 00 00 00 48 89 df e8 8c de
ff ff 48 89 c7 48 89 c5 e8 b5 d7 ff ff 85 c0 75 22 48 f7 45 00 00 c0
00 00 75 02 <0f> 0b 48 89 ef e8 a8 d7 ff ff 5b 48 89 ef 89 c6 5d 41 5c
e9 9a
[28975.050651] RIP  [<ffffffff810ebcb2>] kfree+0x59/0xe7
[28975.056354]  RSP <ffff8809a5f0fe30>
[28975.062479] ---[ end trace 29f90372a2c3b0ac ]---

The machine hobbles along until all processes crash at this point.

[14064.855658] =============================================================================
[14064.864884] BUG kmalloc-192 (Tainted: G        W   ): Poison overwritten
[14064.872424] -----------------------------------------------------------------------------
[14064.872424]
[14064.883322] Disabling lock debugging due to kernel taint
[14064.889304] INFO: 0xffff880930a54824-0xffff880930a54824. First byte
0x6c instead of 0x6b
[14064.898433] INFO: Allocated in alloc_pipe_info+0x17/0x94 age=80
cpu=8 pid=28897
[14064.906684]  set_track+0x5c/0xd7
[14064.910332]  alloc_debug_processing+0x76/0xfd
[14064.915250]  __slab_alloc+0x3e0/0x417
[14064.919386]  alloc_inode+0x26/0x6c
[14064.923232]  alloc_pipe_info+0x17/0x94
[14064.927467]  kmem_cache_alloc_trace+0xbe/0x14e
[14064.932486]  alloc_pipe_info+0x17/0x94
[14064.936721]  alloc_pipe_info+0x17/0x94
[14064.940956]  create_pipe_files+0x3c/0x1e5
[14064.945481]  __do_pipe_flags+0x23/0xa7
[14064.949718]  SyS_pipe2+0x18/0x86
[14064.953374]  system_call_fastpath+0x16/0x1b
[14064.958098] INFO: Freed in pipe_release+0xc4/0xcd age=76 cpu=23 pid=28897
[14064.965740]  set_track+0x5c/0xd7
[14064.969389]  free_debug_processing+0x11d/0x1a9
[14064.974391]  __slab_free+0x32/0x30a
[14064.978334]  free_pages_prepare+0x104/0x128
[14064.983053]  pipe_release+0xc4/0xcd
[14064.986991]  __fput+0xe1/0x1e4
[14064.990443]  task_work_run+0x7b/0x8f
[14064.994482]  do_notify_resume+0x59/0x68
[14064.998816]  int_signal+0x12/0x17
[14065.002561] INFO: Slab 0xffffea0024c29500 objects=31 used=28
fp=0xffff880930a570c0 flags=0x6ffc0000004080
[14065.013341] INFO: Object 0xffff880930a54820 @offset=2080
fp=0xffff880930a55040
[14065.023177] Bytes b4 ffff880930a54810: 4d fa 14 00 01 00 00 00 5a
5a 5a 5a 5a 5a 5a 5a  M.......ZZZZZZZZ
[14065.033864] Object ffff880930a54820: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkklkkkkkkkkkkk
[14065.044354] Object ffff880930a54830: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.054845] Object ffff880930a54840: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.065335] Object ffff880930a54850: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.075827] Object ffff880930a54860: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.086321] Object ffff880930a54870: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.096806] Object ffff880930a54880: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.107298] Object ffff880930a54890: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.117788] Object ffff880930a548a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.128278] Object ffff880930a548b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.138770] Object ffff880930a548c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[14065.149262] Object ffff880930a548d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[14065.159751] Redzone ffff880930a548e0: bb bb bb bb bb bb bb bb
                   ........
[14065.169559] Padding ffff880930a54a20: 5a 5a 5a 5a 5a 5a 5a 5a
                   ZZZZZZZZ
[14065.179372] CPU: 10 PID: 7969 Comm: nginx-cache Tainted: G    B   W
   3.10.6-cloudflare-debug+ #5
[14065.189469] Hardware name: HP ProLiant DL180 G6  , BIOS O20 01/24/2011
[14065.196807]  0000000000000000 ffffffff810ee8d2 ffff880930a54820
ffffffff81827fe7
[14065.205185]  ffffea0024c29500 ffff880627804900 ffff880930a54820
ffffea0024c29500
[14065.213576]  ffffffff81223ebb ffff880930a54820 ffffea0024c29500
ffffffff810ee9fe
[14065.221965] Call Trace:
[14065.224743]  [<ffffffff810ee8d2>] ? check_bytes_and_report+0xa7/0xf4
[14065.231902]  [<ffffffff81223ebb>] ? kmem_alloc+0x43/0x6d
[14065.237890]  [<ffffffff810ee9fe>] ? check_object+0xdf/0x19e
[14065.244170]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.250154]  [<ffffffff814e834d>] ? alloc_debug_processing+0x5c/0xfd
[14065.257310]  [<ffffffff814e87ce>] ? __slab_alloc+0x3e0/0x417
[14065.263687]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.269671]  [<ffffffff810efe41>] ? __kmalloc+0x15b/0x191
[14065.275755]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.281733]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.287709]  [<ffffffff8124ce2f>] ? xfs_idata_realloc+0xd5/0xf7
[14065.294378]  [<ffffffff810efde0>] ? __kmalloc+0xfa/0x191
[14065.300366]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.306354]  [<ffffffff81223e9d>] ? kmem_alloc+0x25/0x6d
[14065.312340]  [<ffffffff8125c4ab>] ? xfs_log_commit_cil+0xe8/0x3ce
[14065.319202]  [<ffffffff812589a1>] ? xfs_trans_commit+0x5c/0x1c5
[14065.325873]  [<ffffffff81222ed4>] ? xfs_create+0x404/0x527
[14065.332055]  [<ffffffff8121d07d>] ? xfs_vn_mknod+0xb4/0x15e
[14065.338334]  [<ffffffff8112d010>] ? vfs_create+0x5f/0x84
[14065.344318]  [<ffffffff8112e0f6>] ? do_last+0x54b/0x9b7
[14065.350206]  [<ffffffff8112e624>] ? path_openat+0xc2/0x327
[14065.356386]  [<ffffffff81123b89>] ? do_readv_writev+0x119/0x135
[14065.363053]  [<ffffffff8112eb2c>] ? do_filp_open+0x2a/0x6e
[14065.369232]  [<ffffffff8112bda0>] ? getname_flags.part.32+0x22/0x10a
[14065.376386]  [<ffffffff8113791b>] ? __alloc_fd+0xcc/0xdb
[14065.382373]  [<ffffffff81122803>] ? do_sys_open+0x5c/0xe0
[14065.388457]  [<ffffffff814f43d2>] ? system_call_fastpath+0x16/0x1b
[14065.395413] FIX kmalloc-192: Restoring
0xffff880930a54824-0xffff880930a54824=0x6b
[14065.395413]
[14065.405533] FIX kmalloc-192: Marking all objects used

On Mon, Aug 19, 2013 at 9:06 PM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Mon, Aug 19, 2013 at 02:16:36PM -0700, Linus Torvalds wrote:
>> On Mon, Aug 19, 2013 at 1:29 PM, Christoph Lameter <cl@...two.org> wrote:
>> > On Mon, 19 Aug 2013, Simon Kirby wrote:
>> >
>> >>    [... ]  The
>> >> alloc/free traces are always the same -- always alloc_pipe_info and
>> >> free_pipe_info. This is seen on 3.10 and (now) 3.11-rc4:
>> >>
>> >> Object ffff880090f19e78: 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkklkkkkkkkkkkk
>> >
>> > This looks like an increment after free in the second 32 bit value of the
>> > structure. First 32 bit value's poison is unchanged.
>>
>> Ugh. If that is "struct pipe_inode_info" and I read it right, that's
>> the "wait_lock" spinlock that is part of the mutex.
>>
>> Doing a "spin_lock()" could indeed cause an increment operation. But
>> it still sounds like a very odd case. And even for some wild pointer
>> I'd then expect the spin_unlock to also happen, and to then increment
>> the next byte (or word) too. More importantly, for a mutex, I'd expect
>> the *other* fields to be corrupted too (the "waiter" field etc). That
>> is, unless we're still spinning waiting for the mutex, but with that
>> value we shouldn't, as far as I can see.
>
> Point...  I would probably start with CONFIG_DEBUG_MUTEXES and see if
> it screams about mutex magic, etc. having been buggered.  FWIW, pipe is
> neither a FIFO nor an internal per-task one - it's a usual pipe(2) one,
> which should've excluded most of the weird codepaths...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/