lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82545fdc-31cc-b963-06c8-151b538bc8a3@redhat.com>
Date:   Wed, 26 Apr 2023 17:15:38 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     brouer@...hat.com, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        netdev@...r.kernel.org, Eric Dumazet <eric.dumazet@...il.com>,
        linux-mm@...ck.org, Mel Gorman <mgorman@...hsingularity.net>,
        lorenzo@...nel.org,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        linyunsheng@...wei.com, bpf@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>, willy@...radead.org
Subject: Re: [PATCH RFC net-next/mm V1 2/3] page_pool: Use static_key for
 shutdown phase


On 26/04/2023 01.30, Alexei Starovoitov wrote:
> On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
>> Performance is very important for page pool (PP). This add the use of
>> static_key APIs for regaining a single instruction, which makes the
>> new PP shutdown scheme zero impact.
>>
>> We are uncertain if this is 100% correct, because static_key APIs uses
>> a mutex lock and it is uncertain if all contexts that can return pages
>> can support this. We could spawn a workqueue (like we just removed) to
>> workaround this issue.
> 
> With debug atomic sleep the issue should be trivial to see.
> iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
> which is not sleepable and mutex-es should warn.
> 

Sure, adding CONFIG_DEBUG_ATOMIC_SLEEP and (keeping) CONFIG_DEBUG_MUTEXES.

Testing this with veth changes that added page_pool to veth
commit 0ebab78cbcbf ("net: veth: add page_pool for page recycling").
which makes it easier to create/trigger inflight packets
via softnet_data->defer_list.

It confirms my suspicion, this patch is not correct.

The dmesg warning looks like below.
(The XXX printk is just from my own debugging patch)

The call site net_rx_action+0xe7 is skb_defer_free_flush
  $ ./scripts/faddr2line vmlinux net_rx_action+0xe7
  net_rx_action+0xe7/0x360:
  skb_defer_free_flush at net/core/dev.c:6615
  (inlined by) skb_defer_free_flush at net/core/dev.c:6601
  (inlined by) net_rx_action at net/core/dev.c:6677


[ 1540.073256] XXX page_pool_shutdown_attempt() inflight=24
[ 1540.078654] XXX page_pool_destroy() Enter into shutdown phase - 
inflight=24
[ 1540.094234] XXX page_pool_shutdown_attempt() inflight=23
[ 1540.099626] XXX page_pool_shutdown_attempt() inflight=22
[ 1540.105012] XXX page_pool_shutdown_attempt() inflight=21
[ 1540.110402] XXX page_pool_shutdown_attempt() inflight=20
[ 1540.115793] XXX page_pool_shutdown_attempt() inflight=19
[ 1540.121183] XXX page_pool_shutdown_attempt() inflight=18
[ 1540.126564] XXX page_pool_shutdown_attempt() inflight=17
[ 1540.131945] XXX page_pool_shutdown_attempt() inflight=16
[ 1540.137329] XXX page_pool_shutdown_attempt() inflight=15
[ 1540.142710] XXX page_pool_shutdown_attempt() inflight=14
[ 1540.148110] XXX page_pool_shutdown_attempt() inflight=13
[ 1540.153512] XXX page_pool_shutdown_attempt() inflight=12
[ 1540.312284] XXX page_pool_shutdown_attempt() inflight=11
[ 1540.317712] XXX page_pool_shutdown_attempt() inflight=10
[ 1540.323128] XXX page_pool_shutdown_attempt() inflight=9
[ 1540.328459] XXX page_pool_shutdown_attempt() inflight=8
[ 1540.333788] XXX page_pool_shutdown_attempt() inflight=7
[ 1540.339111] XXX page_pool_shutdown_attempt() inflight=6
[ 1540.344439] XXX page_pool_shutdown_attempt() inflight=5
[ 1540.349768] XXX page_pool_shutdown_attempt() inflight=4
[ 1540.355091] XXX page_pool_shutdown_attempt() inflight=3
[ 1540.360420] XXX page_pool_shutdown_attempt() inflight=2
[ 1540.365741] XXX page_pool_shutdown_attempt() inflight=1
[ 1540.371064] BUG: sleeping function called from invalid context at 
include/linux/percpu-rwsem.h:49
[ 1540.380052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 
26501, name: netperf
[ 1540.388171] preempt_count: 101, expected: 0
[ 1540.392452] RCU nest depth: 2, expected: 0
[ 1540.396640] CPU: 2 PID: 26501 Comm: netperf Not tainted 
6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.406147] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.413749] Call Trace:
[ 1540.416289]  <IRQ>
[ 1540.418395]  dump_stack_lvl+0x32/0x50
[ 1540.422160]  __might_resched+0x11c/0x160
[ 1540.426182]  cpus_read_lock+0x16/0x60
[ 1540.429942]  static_key_slow_dec+0x17/0x50
[ 1540.434137]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.438940]  page_pool_return_skb_page+0x68/0xe0
[ 1540.443654]  skb_free_head+0x4f/0x90
[ 1540.447326]  skb_release_data+0x142/0x1a0
[ 1540.451435]  napi_consume_skb+0x6b/0x180
[ 1540.455448]  net_rx_action+0xe7/0x360
[ 1540.459209]  __do_softirq+0xcb/0x2b1
[ 1540.462885]  do_softirq+0x63/0x90
[ 1540.466297]  </IRQ>
[ 1540.468487]  <TASK>
[ 1540.470671]  __local_bh_enable_ip+0x64/0x70
[ 1540.474948]  __dev_queue_xmit+0x257/0x8a0
[ 1540.479049]  ? __ip_local_out+0x48/0x160
[ 1540.483070]  ip_finish_output2+0x25c/0x540
[ 1540.487255]  __ip_queue_xmit+0x171/0x420
[ 1540.491266]  ? ___slab_alloc+0xd1/0x670
[ 1540.495202]  __tcp_transmit_skb+0x83c/0x950
[ 1540.499486]  tcp_write_xmit+0x373/0xa60
[ 1540.503419]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.508125]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.512408]  tcp_sendmsg+0x27/0x40
[ 1540.515907]  sock_sendmsg+0x8b/0xa0
[ 1540.519494]  ? sockfd_lookup_light+0x12/0x70
[ 1540.523856]  __sys_sendto+0xeb/0x130
[ 1540.527531]  ? __switch_to_asm+0x3a/0x80
[ 1540.531549]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.535736]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.539751]  __x64_sys_sendto+0x20/0x30
[ 1540.543686]  do_syscall_64+0x3a/0x90
[ 1540.547357]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.552502] RIP: 0033:0x7f389f713680
[ 1540.556174] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.575075] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.582755] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.589983] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1540.597208] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1540.604438] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1540.611664] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1540.618896]  </TASK>
[ 1540.621199] ------------[ cut here ]------------
[ 1540.625911] WARNING: CPU: 2 PID: 26501 at kernel/smp.c:912 
smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.635001] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4 
ib_umad rdma_ucm ib_ipoib rdma_cm iw_cm irdma ib_cm ice 
intel_uncore_frequency intel_uncore_frequency_common coretemp kvm_intel 
mlx5_ib kvm ib_uverbs irqbypass rapl intel_cstate ib_core i2c_i801 
intel_uncore pcspkr i2c_smbus bfq acpi_ipmi wmi ipmi_si ipmi_devintf 
ipmi_msghandler acpi_pad sch_fq_codel drm fuse zram zsmalloc mlx5_core 
ixgbe i40e igb igc sd_mod t10_pi mlxfw psample hid_generic 
crc64_rocksoft_generic i2c_algo_bit crc64_rocksoft i2c_core mdio ptp 
pps_core [last unloaded: bpfilter]
[ 1540.684653] CPU: 2 PID: 26501 Comm: netperf Tainted: G        W 
    6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.695642] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 
2.0a 08/01/2016
[ 1540.703239] RIP: 0010:smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.709090] Code: fe ff ff 8b 44 24 14 48 c7 c6 a0 04 fc 82 8b 15 85 
f4 e6 01 48 8b 3c 24 8d 48 01 48 63 c9 e8 36 c5 4b 00 89 c2 e9 c5 fd ff 
ff <0f> 0b e9 86 fd ff ff 8b 7c 24 28 e8 4f b8 f7 ff eb af 66 66 2e 0f
[ 1540.727999] RSP: 0018:ffffc9000003cda8 EFLAGS: 00010206
[ 1540.733322] RAX: 0000000000000102 RBX: ffffffff8383b2ea RCX: 
0000000000000003
[ 1540.740552] RDX: 0000000000000000 RSI: ffffffff810271a0 RDI: 
ffffffff82fc04a0
[ 1540.747782] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000001
[ 1540.755008] R10: 000000087c9e9000 R11: ffffffff82c4f620 R12: 
0000000000000000
[ 1540.762241] R13: ffffffff8383b300 R14: 0000000000000002 R15: 
0000000000000001
[ 1540.769474] FS:  00007f389f954740(0000) GS:ffff88887fc80000(0000) 
knlGS:0000000000000000
[ 1540.777685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1540.783528] CR2: 00007f1e27582000 CR3: 000000015775c004 CR4: 
00000000003706e0
[ 1540.790761] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[ 1540.797987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[ 1540.805220] Call Trace:
[ 1540.807763]  <IRQ>
[ 1540.809868]  ? __pfx_do_sync_core+0x10/0x10
[ 1540.814146]  on_each_cpu_cond_mask+0x20/0x40
[ 1540.818514]  text_poke_bp_batch+0x9e/0x250
[ 1540.822708]  text_poke_finish+0x1b/0x30
[ 1540.826644]  arch_jump_label_transform_apply+0x16/0x30
[ 1540.831880]  __static_key_slow_dec_cpuslocked.part.0+0x2f/0x40
[ 1540.837809]  static_key_slow_dec+0x30/0x50
[ 1540.841999]  page_pool_shutdown_attempt+0x50/0x60
[ 1540.846803]  page_pool_return_skb_page+0x68/0xe0
[ 1540.851517]  skb_free_head+0x4f/0x90
[ 1540.855194]  skb_release_data+0x142/0x1a0
[ 1540.859300]  napi_consume_skb+0x6b/0x180
[ 1540.863320]  net_rx_action+0xe7/0x360
[ 1540.867082]  __do_softirq+0xcb/0x2b1
[ 1540.870756]  do_softirq+0x63/0x90
[ 1540.874171]  </IRQ>
[ 1540.876359]  <TASK>
[ 1540.878553]  __local_bh_enable_ip+0x64/0x70
[ 1540.882829]  __dev_queue_xmit+0x257/0x8a0
[ 1540.886938]  ? __ip_local_out+0x48/0x160
[ 1540.890959]  ip_finish_output2+0x25c/0x540
[ 1540.895155]  __ip_queue_xmit+0x171/0x420
[ 1540.899175]  ? ___slab_alloc+0xd1/0x670
[ 1540.903108]  __tcp_transmit_skb+0x83c/0x950
[ 1540.907393]  tcp_write_xmit+0x373/0xa60
[ 1540.911327]  __tcp_push_pending_frames+0x32/0xf0
[ 1540.916041]  tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.920324]  tcp_sendmsg+0x27/0x40
[ 1540.923821]  sock_sendmsg+0x8b/0xa0
[ 1540.927411]  ? sockfd_lookup_light+0x12/0x70
[ 1540.931780]  __sys_sendto+0xeb/0x130
[ 1540.935456]  ? __switch_to_asm+0x3a/0x80
[ 1540.939477]  ? rseq_get_rseq_cs+0x32/0x290
[ 1540.943670]  ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.947693]  __x64_sys_sendto+0x20/0x30
[ 1540.951626]  do_syscall_64+0x3a/0x90
[ 1540.955300]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.960451] RIP: 0033:0x7f389f713680
[ 1540.964124] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.983035] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX: 
000000000000002c
[ 1540.990724] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX: 
00007f389f713680
[ 1540.997957] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI: 
0000000000000004
[ 1541.005187] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09: 
0000000000000000
[ 1541.012422] R10: 0000000000000000 R11: 0000000000000246 R12: 
000000000063fcf8
[ 1541.019649] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15: 
00007f389fa87000
[ 1541.026881]  </TASK>
[ 1541.029166] ---[ end trace 0000000000000000 ]---
[ 1541.033925] XXX page_pool_shutdown_attempt() inflight=0




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ