lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABWYdi27baYc3ShHcZExmmXVmxOQXo9sGO+iFhfZLq78k8iaAg@mail.gmail.com>
Date:   Tue, 2 Feb 2021 19:09:44 -0800
From:   Ivan Babrou <ivan@...udflare.com>
To:     kernel-team <kernel-team@...udflare.com>
Cc:     Ignat Korchagin <ignat@...udflare.com>,
        Hailong liu <liu.hailong6@....com.cn>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Miroslav Benes <mbenes@...e.cz>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Julien Thierry <jthierry@...hat.com>,
        Jiri Slaby <jirislaby@...nel.org>, kasan-dev@...glegroups.com,
        linux-mm@...ck.org, linux-kernel <linux-kernel@...r.kernel.org>,
        Alasdair Kergon <agk@...hat.com>,
        Mike Snitzer <snitzer@...hat.com>, dm-devel@...hat.com,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        Andrii Nakryiko <andriin@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>,
        Robert Richter <rric@...nel.org>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        bpf@...r.kernel.org
Subject: Re: BUG: KASAN: stack-out-of-bounds in unwind_next_frame+0x1df5/0x2650

On Thu, Jan 28, 2021 at 7:35 PM Ivan Babrou <ivan@...udflare.com> wrote:
>
> Hello,
>
> We've noticed the following regression in Linux 5.10 branch:
>
> [  128.367231][    C0]
> ==================================================================
> [  128.368523][    C0] BUG: KASAN: stack-out-of-bounds in
> unwind_next_frame (arch/x86/kernel/unwind_orc.c:371
> arch/x86/kernel/unwind_orc.c:544)
> [  128.369744][    C0] Read of size 8 at addr ffff88802fceede0 by task
> kworker/u2:2/591
> [  128.370916][    C0]
> [  128.371269][    C0] CPU: 0 PID: 591 Comm: kworker/u2:2 Not tainted
> 5.10.11-cloudflare-kasan-2021.1.15 #1
> [  128.372626][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [  128.374346][    C0] Workqueue: writeback wb_workfn (flush-254:0)
> [  128.375275][    C0] Call Trace:
> [  128.375763][    C0]  <IRQ>
> [  128.376221][    C0]  dump_stack+0x7d/0xa3
> [  128.376843][    C0]  print_address_description.constprop.0+0x1c/0x210
> [  128.377827][    C0]  ? _raw_spin_lock_irqsave
> (arch/x86/include/asm/atomic.h:202
> include/asm-generic/atomic-instrumented.h:707
> include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:195
> include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:159)
> [  128.378624][    C0]  ? _raw_write_unlock_bh (kernel/locking/spinlock.c:158)
> [  128.379389][    C0]  ? unwind_next_frame (arch/x86/kernel/unwind_orc.c:444)
> [  128.380177][    C0]  ? unwind_next_frame
> (arch/x86/kernel/unwind_orc.c:371 arch/x86/kernel/unwind_orc.c:544)
> [  128.380954][    C0]  ? unwind_next_frame
> (arch/x86/kernel/unwind_orc.c:371 arch/x86/kernel/unwind_orc.c:544)
> [  128.381736][    C0]  kasan_report.cold+0x1f/0x37
> [  128.382438][    C0]  ? unwind_next_frame
> (arch/x86/kernel/unwind_orc.c:371 arch/x86/kernel/unwind_orc.c:544)
> [  128.383192][    C0]  unwind_next_frame+0x1df5/0x2650
> [  128.383954][    C0]  ? asm_common_interrupt
> (arch/x86/include/asm/idtentry.h:622)
> [  128.384726][    C0]  ? get_stack_info_noinstr
> (arch/x86/kernel/dumpstack_64.c:157)
> [  128.385530][    C0]  ? glue_xts_req_128bit+0x110/0x6f0 glue_helper
> [  128.386509][    C0]  ? deref_stack_reg (arch/x86/kernel/unwind_orc.c:418)
> [  128.387267][    C0]  ? is_module_text_address (kernel/module.c:4566
> kernel/module.c:4550)
> [  128.388077][    C0]  ? glue_xts_req_128bit+0x110/0x6f0 glue_helper
> [  128.389048][    C0]  ? kernel_text_address.part.0 (kernel/extable.c:145)
> [  128.389901][    C0]  ? glue_xts_req_128bit+0x110/0x6f0 glue_helper
> [  128.390865][    C0]  ? stack_trace_save (kernel/stacktrace.c:82)
> [  128.391550][    C0]  arch_stack_walk+0x8d/0xf0
> [  128.392216][    C0]  ? kfree (mm/slub.c:3142 mm/slub.c:4124)
> [  128.392807][    C0]  stack_trace_save+0x96/0xd0
> [  128.393535][    C0]  ? create_prof_cpu_mask (kernel/stacktrace.c:113)
> [  128.394320][    C0]  ? blk_update_request (block/blk-core.c:264
> block/blk-core.c:1468)
> [  128.395113][    C0]  ? asm_call_irq_on_stack (arch/x86/entry/entry_64.S:796)
> [  128.395887][    C0]  ? do_softirq_own_stack
> (arch/x86/include/asm/irq_stack.h:27
> arch/x86/include/asm/irq_stack.h:77 arch/x86/kernel/irq_64.c:77)
> [  128.396678][    C0]  ? irq_exit_rcu (kernel/softirq.c:393
> kernel/softirq.c:423 kernel/softirq.c:435)
> [  128.397349][    C0]  ? common_interrupt (arch/x86/kernel/irq.c:239)
> [  128.398086][    C0]  ? asm_common_interrupt
> (arch/x86/include/asm/idtentry.h:622)
> [  128.398886][    C0]  ? get_page_from_freelist (mm/page_alloc.c:3480
> mm/page_alloc.c:3904)
> [  128.399759][    C0]  kasan_save_stack+0x20/0x50
> [  128.400453][    C0]  ? kasan_save_stack (mm/kasan/common.c:48)
> [  128.401175][    C0]  ? kasan_set_track (mm/kasan/common.c:56)
> [  128.401881][    C0]  ? kasan_set_free_info (mm/kasan/generic.c:360)
> [  128.402646][    C0]  ? __kasan_slab_free (mm/kasan/common.c:283
> mm/kasan/common.c:424)
> [  128.403375][    C0]  ? slab_free_freelist_hook (mm/slub.c:1577)
> [  128.404199][    C0]  ? kfree (mm/slub.c:3142 mm/slub.c:4124)
> [  128.404835][    C0]  ? nvme_pci_complete_rq+0x105/0x350 nvme
> [  128.405765][    C0]  ? blk_done_softirq (include/linux/list.h:282
> block/blk-mq.c:581)
> [  128.406552][    C0]  ? __do_softirq
> (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200
> include/trace/events/irq.h:142 kernel/softirq.c:299)
> [  128.407272][    C0]  ? asm_call_irq_on_stack (arch/x86/entry/entry_64.S:796)
> [  128.408087][    C0]  ? do_softirq_own_stack
> (arch/x86/include/asm/irq_stack.h:27
> arch/x86/include/asm/irq_stack.h:77 arch/x86/kernel/irq_64.c:77)
> [  128.408878][    C0]  ? irq_exit_rcu (kernel/softirq.c:393
> kernel/softirq.c:423 kernel/softirq.c:435)
> [  128.409602][    C0]  ? common_interrupt (arch/x86/kernel/irq.c:239)
> [  128.410366][    C0]  ? asm_common_interrupt
> (arch/x86/include/asm/idtentry.h:622)
> [  128.411184][    C0]  ? skcipher_walk_next (crypto/skcipher.c:322
> crypto/skcipher.c:384)
> [  128.412009][    C0]  ? skcipher_walk_virt (crypto/skcipher.c:487)
> [  128.412811][    C0]  ? glue_xts_req_128bit+0x110/0x6f0 glue_helper
> [  128.413792][    C0]  ? asm_common_interrupt
> (arch/x86/include/asm/idtentry.h:622)
> [  128.414562][    C0]  ? kcryptd_crypt_write_convert+0x3a2/0xa10 dm_crypt
> [  128.415591][    C0]  ? crypt_map+0x5c1/0xc70 dm_crypt
> [  128.416389][    C0]  ? __map_bio.isra.0+0x109/0x450 dm_mod
> [  128.417275][    C0]  ? __split_and_process_non_flush+0x728/0xd10 dm_mod
> [  128.418293][    C0]  ? dm_submit_bio+0x4f1/0xec0 dm_mod
> [  128.419068][    C0]  ? submit_bio_noacct (block/blk-core.c:934
> block/blk-core.c:982 block/blk-core.c:1061)
> [  128.419806][    C0]  ? submit_bio (block/blk-core.c:1079)
> [  128.420458][    C0]  ? _raw_spin_lock_irqsave
> (arch/x86/include/asm/atomic.h:202
> include/asm-generic/atomic-instrumented.h:707
> include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:195
> include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:159)
> [  128.421244][    C0]  ? _raw_write_unlock_bh (kernel/locking/spinlock.c:158)
> [  128.422015][    C0]  ? ret_from_fork (arch/x86/entry/entry_64.S:302)
> [  128.422696][    C0]  ? kmem_cache_free (mm/slub.c:3142 mm/slub.c:3158)
> [  128.423427][    C0]  ? memset (mm/kasan/common.c:84)
> [  128.424000][    C0]  ? dma_pool_free (mm/dmapool.c:405)
> [  128.424698][    C0]  ? slab_free_freelist_hook (mm/slub.c:1577)
> [  128.425518][    C0]  ? dma_pool_create (mm/dmapool.c:405)
> [  128.426234][    C0]  ? kmem_cache_free (mm/slub.c:3142 mm/slub.c:3158)
> [  128.426923][    C0]  ? raise_softirq_irqoff
> (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:469)
> [  128.427691][    C0]  kasan_set_track+0x1c/0x30
> [  128.428366][    C0]  kasan_set_free_info+0x1b/0x30
> [  128.429113][    C0]  __kasan_slab_free+0x110/0x150
> [  128.429838][    C0]  slab_free_freelist_hook+0x66/0x120
> [  128.430628][    C0]  kfree+0xbf/0x4d0
> [  128.431192][    C0]  ? nvme_pci_complete_rq+0x105/0x350 nvme
> [  128.432107][    C0]  ? nvme_unmap_data+0x349/0x440 nvme
> [  128.432882][    C0]  nvme_pci_complete_rq+0x105/0x350 nvme
> [  128.433750][    C0]  blk_done_softirq+0x2ff/0x590
> [  128.434441][    C0]  ? blk_mq_stop_hw_queue (block/blk-mq.c:573)
> [  128.435161][    C0]  ? _raw_spin_lock_bh (kernel/locking/spinlock.c:150)
> [  128.435894][    C0]  ? _raw_spin_lock_bh (kernel/locking/spinlock.c:150)
> [  128.436582][    C0]  __do_softirq+0x1a0/0x667
> [  128.437218][    C0]  asm_call_irq_on_stack+0x12/0x20
> [  128.437975][    C0]  </IRQ>
> [  128.438397][    C0]  do_softirq_own_stack+0x37/0x40
> [  128.439120][    C0]  irq_exit_rcu+0x110/0x1b0
> [  128.439807][    C0]  common_interrupt+0x74/0x120
> [  128.440545][    C0]  asm_common_interrupt+0x1e/0x40
> [  128.441287][    C0] RIP: 0010:skcipher_walk_next
> (crypto/skcipher.c:322 crypto/skcipher.c:384)
> [  128.442126][    C0] Code: 85 dd 10 00 00 49 8d 7c 24 08 49 89 14 24
> 48 b9 00 00 00 00 00 fc ff df 41 81 e5 ff 0f 00 00 48 89 fe 48 c1 ee
> 03 80 3c 0e 00 <0f> 85 80 10 00 00 48 89 c6 4d 89 6c 24 08 48 bc
> All code
> ========
>    0: 85 dd                test   %ebx,%ebp
>    2: 10 00                adc    %al,(%rax)
>    4: 00 49 8d              add    %cl,-0x73(%rcx)
>    7: 7c 24                jl     0x2d
>    9: 08 49 89              or     %cl,-0x77(%rcx)
>    c: 14 24                adc    $0x24,%al
>    e: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
>   15: fc ff df
>   18: 41 81 e5 ff 0f 00 00 and    $0xfff,%r13d
>   1f: 48 89 fe              mov    %rdi,%rsi
>   22: 48 c1 ee 03          shr    $0x3,%rsi
>   26: 80 3c 0e 00          cmpb   $0x0,(%rsi,%rcx,1)
>   2a:* 0f 85 80 10 00 00    jne    0x10b0 <-- trapping instruction
>   30: 48 89 c6              mov    %rax,%rsi
>   33: 4d 89 6c 24 08        mov    %r13,0x8(%r12)
>   38: 48                    rex.W
>   39: bc                    .byte 0xbc
>
> Code starting with the faulting instruction
> ===========================================
>    0: 0f 85 80 10 00 00    jne    0x1086
>    6: 48 89 c6              mov    %rax,%rsi
>    9: 4d 89 6c 24 08        mov    %r13,0x8(%r12)
>    e: 48                    rex.W
>    f: bc                    .byte 0xbc
> [  128.445089][    C0] RSP: 0018:ffff88802fceebf0 EFLAGS: 00000246
> [  128.445969][    C0] RAX: ffff888003b571b8 RBX: 0000000000000000
> RCX: dffffc0000000000
> [  128.447124][    C0] RDX: ffffea00017cd580 RSI: 1ffff11005f9dda8
> RDI: ffff88802fceed40
> [  128.448281][    C0] RBP: ffff88802fceec70 R08: ffff88802fceedc4
> R09: 00000000ffffffee
> [  128.449457][    C0] R10: 0000000000000000 R11: 1ffff11005f9ddaf
> R12: ffff88802fceed38
> [  128.450641][    C0] R13: 0000000000000000 R14: ffff888003b57138
> R15: ffff88802fceedc8
> [  128.451827][    C0]  ? arch_stack_walk (arch/x86/kernel/stacktrace.c:24)
> [  128.452482][    C0]  skcipher_walk_virt+0x4be/0x7e0
> [  128.453242][    C0]  glue_xts_req_128bit+0x110/0x6f0 glue_helper
> [  128.454175][    C0]  ? aesni_set_key+0x1e0/0x1e0 aesni_intel
> [  128.455042][    C0]  ? irq_exit_rcu (kernel/softirq.c:406
> kernel/softirq.c:425 kernel/softirq.c:435)
> [  128.455719][    C0]  ? glue_xts_crypt_128bit_one+0x280/0x280 glue_helper
> [  128.456753][    C0]  asm_common_interrupt+0x1e/0x40
> [  128.457530][    C0] RIP: b8fa2500:0xdffffc0000000000
> [  128.458305][    C0] Code: Unable to access opcode bytes at RIP
> 0xdffffbffffffffd6.
>
> Code starting with the faulting instruction
> ===========================================
> [  128.459443][    C0] RSP: 974be3f3:ffff88809c437290 EFLAGS: 00000004
> ORIG_RAX: 0000001000000010
> [  128.460755][    C0] RAX: 0000000000000000 RBX: ffff888003b571b8
> RCX: 0000000000000000
> [  128.461967][    C0] RDX: ffff888003b57240 RSI: ffff888003b57240
> RDI: ffffffe000000010
> [  128.463152][    C0] RBP: dffffc0000000200 R08: 0000000000000801
> R09: ffffea0001123480
> [  128.464345][    C0] R10: ffffed1000000200 R11: ffffffff00000000
> R12: ffff888000000000
> [  128.465522][    C0] R13: ffff888003b57138 R14: ffff88809c437290
> R15: ffffea00002c5b08
> [  128.466710][    C0]  ? get_page_from_freelist (mm/page_alloc.c:3913)
> [  128.467560][    C0]  ? worker_thread (include/linux/list.h:282
> kernel/workqueue.c:2419)
> [  128.468279][    C0]  ? kthread (kernel/kthread.c:292)
> [  128.468919][    C0]  ? ret_from_fork (arch/x86/entry/entry_64.S:302)
> [  128.469607][    C0]  ? __writeback_inodes_wb (fs/fs-writeback.c:1793)
> [  128.470418][    C0]  ? wb_writeback (fs/fs-writeback.c:1898)
> [  128.471145][    C0]  ? process_one_work
> (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200
> include/trace/events/workqueue.h:108 kernel/workqueue.c:2277)
> [  128.471930][    C0]  ? worker_thread (include/linux/list.h:282
> kernel/workqueue.c:2419)
> [  128.472668][    C0]  ? ret_from_fork (arch/x86/entry/entry_64.S:302)
> [  128.473329][    C0]  ? __zone_watermark_ok (mm/page_alloc.c:3793)
> [  128.474065][    C0]  ? __kasan_kmalloc.constprop.0
> (mm/kasan/common.c:56 mm/kasan/common.c:461)
> [  128.474914][    C0]  ? crypt_convert+0x27e5/0x4530 dm_crypt
> [  128.475796][    C0]  ? mempool_alloc (mm/mempool.c:392)
> [  128.476493][    C0]  ? crypt_iv_tcw_ctr+0x4a0/0x4a0 dm_crypt
> [  128.477433][    C0]  ? bio_add_page (block/bio.c:943)
> [  128.478129][    C0]  ? __bio_try_merge_page (block/bio.c:935)
> [  128.478923][    C0]  ? bio_associate_blkg (block/blk-cgroup.c:1869)
> [  128.479693][    C0]  ? kcryptd_crypt_write_convert+0x581/0xa10 dm_crypt
> [  128.480721][    C0]  ? crypt_map+0x5c1/0xc70 dm_crypt
> [  128.481527][    C0]  ? bio_clone_blkg_association (block/blk-cgroup.c:1883)
> [  128.482426][    C0]  ? __map_bio.isra.0+0x109/0x450 dm_mod
> [  128.483310][    C0]  ? __split_and_process_non_flush+0x728/0xd10 dm_mod
> [  128.484354][    C0]  ? __send_empty_flush+0x4b0/0x4b0 dm_mod
> [  128.485223][    C0]  ? __part_start_io_acct (block/blk-core.c:1336)
> [  128.486009][    C0]  ? dm_submit_bio+0x4f1/0xec0 dm_mod
> [  128.486829][    C0]  ? __split_and_process_non_flush+0xd10/0xd10 dm_mod
> [  128.487915][    C0]  ? submit_bio_noacct (block/blk-core.c:934
> block/blk-core.c:982 block/blk-core.c:1061)
> [  128.488686][    C0]  ? _cond_resched (kernel/sched/core.c:6124)
> [  128.489388][    C0]  ? blk_queue_enter (block/blk-core.c:1044)
> [  128.490300][    C0]  ? iomap_readahead (fs/iomap/buffered-io.c:1438)
> [  128.491041][    C0]  ? write_one_page (mm/page-writeback.c:2171)
> [  128.491759][    C0]  ? submit_bio (block/blk-core.c:1079)
> [  128.492432][    C0]  ? submit_bio_noacct (block/blk-core.c:1079)
> [  128.493248][    C0]  ? _raw_spin_lock
> (arch/x86/include/asm/atomic.h:202
> include/asm-generic/atomic-instrumented.h:707
> include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:183
> include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [  128.493975][    C0]  ? iomap_submit_ioend (fs/iomap/buffered-io.c:1215)
> [  128.494761][    C0]  ? xfs_vm_writepages (fs/xfs/xfs_aops.c:578)
> [  128.495529][    C0]  ? xfs_dax_writepages (fs/xfs/xfs_aops.c:578)
> [  128.496278][    C0]  ? __blk_mq_do_dispatch_sched
> (block/blk-mq-sched.c:135 (discriminator 1))
> [  128.497120][    C0]  ? do_writepages (mm/page-writeback.c:2355)
> [  128.497831][    C0]  ? page_writeback_cpu_online (mm/page-writeback.c:2345)
> [  128.498681][    C0]  ? _raw_spin_lock
> (arch/x86/include/asm/atomic.h:202
> include/asm-generic/atomic-instrumented.h:707
> include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:183
> include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [  128.499405][    C0]  ? wake_up_bit (kernel/sched/wait_bit.c:15
> kernel/sched/wait_bit.c:149)
> [  128.500072][    C0]  ? __writeback_single_inode (fs/fs-writeback.c:1470)
> [  128.500908][    C0]  ? writeback_sb_inodes (fs/fs-writeback.c:1725)
> [  128.501703][    C0]  ? __writeback_single_inode (fs/fs-writeback.c:1634)
> [  128.502571][    C0]  ? finish_writeback_work.constprop.0
> (fs/fs-writeback.c:1242)
> [  128.503525][    C0]  ? __writeback_inodes_wb (fs/fs-writeback.c:1793)
> [  128.504336][    C0]  ? wb_writeback (fs/fs-writeback.c:1898)
> [  128.505031][    C0]  ? __writeback_inodes_wb (fs/fs-writeback.c:1846)
> [  128.505902][    C0]  ? cpumask_next (lib/cpumask.c:24)
> [  128.506570][    C0]  ? get_nr_dirty_inodes (fs/inode.c:94 fs/inode.c:102)
> [  128.507348][    C0]  ? wb_workfn (fs/fs-writeback.c:2054
> fs/fs-writeback.c:2082)
> [  128.508014][    C0]  ? dequeue_entity (kernel/sched/fair.c:4347)
> [  128.508744][    C0]  ? inode_wait_for_writeback (fs/fs-writeback.c:2065)
> [  128.509586][    C0]  ? put_prev_entity (kernel/sched/fair.c:4501)
> [  128.510300][    C0]  ? __switch_to
> (arch/x86/include/asm/bitops.h:55
> include/asm-generic/bitops/instrumented-atomic.h:29
> include/linux/thread_info.h:55 arch/x86/include/asm/fpu/internal.h:572
> arch/x86/kernel/process_64.c:598)
> [  128.510990][    C0]  ? __switch_to_asm (arch/x86/entry/entry_64.S:255)
> [  128.511695][    C0]  ? __schedule (kernel/sched/core.c:3782
> kernel/sched/core.c:4528)
> [  128.512373][    C0]  ? process_one_work
> (arch/x86/include/asm/jump_label.h:25 include/linux/jump_label.h:200
> include/trace/events/workqueue.h:108 kernel/workqueue.c:2277)
> [  128.513133][    C0]  ? worker_thread (include/linux/list.h:282
> kernel/workqueue.c:2419)
> [  128.513850][    C0]  ? rescuer_thread (kernel/workqueue.c:2361)
> [  128.514566][    C0]  ? kthread (kernel/kthread.c:292)
> [  128.515200][    C0]  ? __kthread_bind_mask (kernel/kthread.c:245)
> [  128.515960][    C0]  ? ret_from_fork (arch/x86/entry/entry_64.S:302)
> [  128.516641][    C0]
> [  128.516983][    C0] The buggy address belongs to the page:
> [  128.517838][    C0] page:000000007a390a2b refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x2fcee
> [  128.519428][    C0] flags: 0x1ffff800000000()
> [  128.520102][    C0] raw: 001ffff800000000 ffffea0000bf3b88
> ffffea0000bf3b88 0000000000000000
> [  128.521396][    C0] raw: 0000000000000000 0000000000000000
> 00000000ffffffff 0000000000000000
> [  128.522673][    C0] page dumped because: kasan: bad access detected
> [  128.523642][    C0]
> [  128.523984][    C0] addr ffff88802fceede0 is located in stack of
> task kworker/u2:2/591 at offset 216 in frame:
> [  128.525503][    C0]  glue_xts_req_128bit+0x0/0x6f0 glue_helper
> [  128.526390][    C0]
> [  128.526745][    C0] this frame has 5 objects:
> [  128.527405][    C0]  [48, 200) 'walk'
> [  128.527407][    C0]  [272, 304) 'b'
> [  128.527969][    C0]  [336, 400) 's'
> [  128.528509][    C0]  [432, 496) 'd'
> [  128.529047][    C0]  [528, 608) 'subreq'
> [  128.529607][    C0]
> [  128.530568][    C0] Memory state around the buggy address:
> [  128.531443][    C0]  ffff88802fceec80: 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00
> [  128.532708][    C0]  ffff88802fceed00: 00 f1 f1 f1 f1 f1 f1 00 00
> 00 00 00 00 00 00 00
> [  128.533911][    C0] >ffff88802fceed80: 00 00 00 00 00 00 00 00 00
> 00 f2 f2 f2 f2 f2 f2
> [  128.535106][    C0]                                                        ^
> [  128.536197][    C0]  ffff88802fceee00: f2 f2 f2 00 00 00 00 f2 f2
> f2 f2 00 00 00 00 00
> [  128.537404][    C0]  ffff88802fceee80: 00 00 00 f2 f2 f2 f2 00 00
> 00 00 00 00 00 00 f2
>
> There are other stacks that end in the same place without dm-crypt
> involvement, but they are much harder for us to reproduce, so let's
> stick with this one.
>
> After some bisecting from myself and Ignat, we were able to find the
> commit that fixes the issue, which is:
>
> * https://github.com/torvalds/linux/commit/ce8f86ee94fabcc98537ddccd7e82cfd360a4dc5?w=1
>
> mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
>
> The trace point *trace_mm_page_alloc_zone_locked()* in __rmqueue() does
> not currently cover all branches.  Add the missing tracepoint and check
> the page before do that.
>
> We don't have CONFIG_CMA enabled, so it can be distilled to:
>
> $ git diff HEAD^..HEAD
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 14b9e83ff9da..b5961d530929 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2871,7 +2871,8 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
>                         goto retry;
>         }
>
> -       trace_mm_page_alloc_zone_locked(page, order, migratetype);
> +       if (page)
> +               trace_mm_page_alloc_zone_locked(page, order, migratetype);
>         return page;
>  }
>
> If I apply this patch on top of 5.10.11, the issue disappears.
>
> I can't say I understand the connection here.
>
> It's worth mentioning that the issue doesn't reproduce with
> UNWINDER_FRAME_POINTER rather than UNWINDER_ORC. This fact makes me
> think that ORC is to blame here somehow, but it's beyond my
> understanding.
>
> Here's how I replicate the issue in qemu running Debian Buster:
>
> # /tmp is tmpfs in our case
> $ qemu-img create -f qcow2 /tmp/nvme-$USER.img 10G
>
> $ sudo qemu-system-x86_64 -smp 1 -m 3G -enable-kvm -cpu host -kernel
> ~/vmlinuz -initrd ~/initrd.img -nographic -device e1000 -device
> nvme,drive=nvme0,serial=deadbeaf1,num_queues=8 -drive
> file=/tmp/nvme-$USER.img,if=none,id=nvme0 -append 'console=ttyS0
> kasan_multi_shot'
>
> Inside of the VM:
>
> root@...alhost:~# echo -e '[Match]\nName=enp*\n[Network]\nDHCP=yes' >
> /etc/systemd/network/00-dhcp.network
> root@...alhost:~# systemctl restart systemd-networkd
> root@...alhost:~# apt-get update
> root@...alhost:~# apt-get install -y --no-install-recommends cryptsetup
> root@...alhost:~# echo potato > keyfile
> root@...alhost:~# chmod 0400 keyfile
> root@...alhost:~# cryptsetup -q luksFormat /dev/nvme0n1 keyfile
> root@...alhost:~# cryptsetup open --type luks --key-file keyfile
> --disable-keyring /dev/nvme0n1 luks-nvme0n1
> root@...alhost:~# dmsetup table /dev/mapper/luks-nvme0n1 | sed 's/$/ 2
> no_read_workqueue no_write_workqueue/' | dmsetup reload
> /dev/mapper/luks-nvme0n1
> root@...alhost:~# dmsetup suspend /dev/mapper/luks-nvme0n1 && dmsetup
> resume /dev/mapper/luks-nvme0n1
> root@...alhost:~# mkfs.xfs -f /dev/mapper/luks-nvme0n1
> root@...alhost:~# mount /dev/mapper/luks-nvme0n1 /mnt
>
> The workload that triggers the KASAN complaint is the following:
>
> root@...alhost:~# while true; do rm -f /mnt/random.data.target && dd
> if=/dev/zero of=/mnt/random.data bs=10M count=400 status=progress &&
> mv /mnt/random.data /mnt/random.data.target; sleep 1; done
>
> It might take a few iterations to trigger.
>
> Note that dmcrypt setup in our case depends on Ignat's patches, which
> are included in 5.10.11 and 5.11-rc5, so during bisection between
> 5.11-rc3 and 5.11-rc4 they needed to be reapplied.
>
> I'm going to ask for a backport of the "fix" to stable, but it feels
> like there's a bigger issue here.

Hello again and the first hello for new people in CC as I have an update,

(Please let me know if I should get the list of people to CC not from
get_maintainers.pl, since it gave me a lot of people and it doesn't
feel right.)

We've seen the issue even after backporting ce8f86ee94fa, this time
much later in uptime, outside of dm-crypt and without a reliable
reproduction.

I noticed that the bug doesn't reproduce on Linux v5.9, so I went
ahead and bisected v5.9..v5.10-rc1 to see where it all started (with
dm-crypt reproduction).

Since there's a ton of merges and regular bisect gave me questionable
results, I had to resort to --first-parent first, which pointed at
dd502a81077a:

$ git bisect log
git bisect start '--first-parent'
# bad: [3650b228f83adda7e5ee532e2b90429c03f7b9ec] Linux 5.10-rc1
git bisect bad 3650b228f83adda7e5ee532e2b90429c03f7b9ec
# good: [bbf5c979011a099af5dc76498918ed7df445635b] Linux 5.9
git bisect good bbf5c979011a099af5dc76498918ed7df445635b
# bad: [578a7155c5a1894a789d4ece181abf9d25dc6b0d] Merge tag
'linux-kselftest-kunit-fixes-5.10-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect bad 578a7155c5a1894a789d4ece181abf9d25dc6b0d
# bad: [3ad11d7ac8872b1c8da54494721fad8907ee41f7] Merge tag
'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
git bisect bad 3ad11d7ac8872b1c8da54494721fad8907ee41f7
# bad: [b85cac574592b843c4be93c83303feeee0c4dc25] Merge tag
'x86-kaslr-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad b85cac574592b843c4be93c83303feeee0c4dc25
# good: [64743e652cea9d6df4264caaa1d7f95273024afb] Merge tag
'x86_cache_for_v5.10' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 64743e652cea9d6df4264caaa1d7f95273024afb
# good: [edaa5ddf3833669a25654d42c0fb653dfdd906df] Merge tag
'sched-core-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good edaa5ddf3833669a25654d42c0fb653dfdd906df
# good: [34eb62d868d729e9a252aa497277081fb652eeed] Merge tag
'core-build-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 34eb62d868d729e9a252aa497277081fb652eeed
# bad: [3bff6112c80cecb76af5fe485506f96e8adb6122] Merge tag
'perf-core-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 3bff6112c80cecb76af5fe485506f96e8adb6122
# bad: [dd502a81077a5f3b3e19fa9a1accffdcab5ad5bc] Merge tag
'core-static_call-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad dd502a81077a5f3b3e19fa9a1accffdcab5ad5bc
# first bad commit: [dd502a81077a5f3b3e19fa9a1accffdcab5ad5bc] Merge
tag 'core-static_call-2020-10-12' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Since core-static_call-2020-10-12 tag is based on top of 5.9-rc3, I
rebased it on v5.9 and repeated the bisect between that and v5.9:

$ git checkout core-static_call-2020-10-12
$ git rebase v5.9
$ git checkout -b ivan/static_call-2020-10-12-rebase-on-v5.9

$ git bisect log
git bisect start
# bad: [6c2fc089268777994dd82ce7c60263f3a71ed0b4] static_call: Fix
return type of static_call_init
git bisect bad 6c2fc089268777994dd82ce7c60263f3a71ed0b4
# good: [bbf5c979011a099af5dc76498918ed7df445635b] Linux 5.9
git bisect good bbf5c979011a099af5dc76498918ed7df445635b
# good: [580b6f7a0af7823277b3ec9aeb2ff48596c10662] x86/static_call:
Add inline static call implementation for x86-64
git bisect good 580b6f7a0af7823277b3ec9aeb2ff48596c10662
# good: [574169ad2d8ce8a80d2798e502d289f6741d8096] static_call: Add
some validation
git bisect good 574169ad2d8ce8a80d2798e502d289f6741d8096
# bad: [4c9c8903fcfb8fca9ab84a8906ee23c998086549] x86/perf,
static_call: Optimize x86_pmu methods
git bisect bad 4c9c8903fcfb8fca9ab84a8906ee23c998086549
# bad: [edfd9b7838ba5e47f19ad8466d0565aba5c59bf0] tracepoint: Optimize
using static_call()
git bisect bad edfd9b7838ba5e47f19ad8466d0565aba5c59bf0
# good: [a5ea9249fde1027124f7ae42d6ca17d53fcb3df0] static_call: Allow early init
git bisect good a5ea9249fde1027124f7ae42d6ca17d53fcb3df0
# first bad commit: [edfd9b7838ba5e47f19ad8466d0565aba5c59bf0]
tracepoint: Optimize using static_call()

edfd9b7838ba5e47f19ad8466d0565aba5c59bf0 is the first bad commit
commit edfd9b7838ba5e47f19ad8466d0565aba5c59bf0
Author: Steven Rostedt (VMware) <rostedt@...dmis.org>
Date:   Tue Aug 18 15:57:52 2020 +0200

    tracepoint: Optimize using static_call()

    Currently the tracepoint site will iterate a vector and issue indirect
    calls to however many handlers are registered (ie. the vector is
    long).

    Using static_call() it is possible to optimize this for the common
    case of only having a single handler registered. In this case the
    static_call() can directly call this handler. Otherwise, if the vector
    is longer than 1, call a function that iterates the whole vector like
    the current code.

    [peterz: updated to new interface]

    Signed-off-by: Steven Rostedt (VMware) <rostedt@...dmis.org>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Signed-off-by: Ingo Molnar <mingo@...nel.org>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Link: https://lore.kernel.org/r/20200818135805.279421092@infradead.org

 include/linux/tracepoint-defs.h |  5 +++
 include/linux/tracepoint.h      | 86 +++++++++++++++++++++++++++++------------
 include/trace/define_trace.h    | 14 +++----
 kernel/tracepoint.c             | 25 ++++++++++--
 4 files changed, 94 insertions(+), 36 deletions(-)

Upstream commit hash is d25e37d89dd2:

* https://github.com/torvalds/linux/commit/d25e37d89dd2

I double checked and its parent (a945c8345ec0) works fine.

Note that the "fix" for 5.10.11 was also tracepoint related:

* https://github.com/torvalds/linux/commit/ce8f86ee94fa

Let me know how I can help get this fixed or debugged further. I'm
happy to try patches.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ