netdev - Re: [net-next, v6, 6/7] net-sysfs: Add interface for Rx queue(s) map per Tx queue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180803221212-mutt-send-email-mst@kernel.org>
Date:   Fri, 3 Aug 2018 22:12:53 +0300
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Andrei Vagin <avagin@...tuozzo.com>
Cc:     "Nambiar, Amritha" <amritha.nambiar@...el.com>,
        netdev@...r.kernel.org, davem@...emloft.net,
        alexander.h.duyck@...el.com, willemdebruijn.kernel@...il.com,
        sridhar.samudrala@...el.com, alexander.duyck@...il.com,
        edumazet@...gle.com, hannes@...essinduktion.org,
        tom@...bertland.com, tom@...ntonium.net, jasowang@...hat.com,
        gaowanlong@...fujitsu.com,
        virtualization@...ts.linux-foundation.org, rostedt@...dmis.org
Subject: Re: [net-next, v6, 6/7] net-sysfs: Add interface for Rx queue(s) map
 per Tx queue

On Fri, Aug 03, 2018 at 12:06:51PM -0700, Andrei Vagin wrote:
> On Fri, Aug 03, 2018 at 12:08:05AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Aug 02, 2018 at 02:04:12PM -0700, Nambiar, Amritha wrote:
> > > On 8/1/2018 5:11 PM, Andrei Vagin wrote:
> > > > On Tue, Jul 10, 2018 at 07:28:49PM -0700, Nambiar, Amritha wrote:
> > > >> With this patch series, I introduced static_key for XPS maps
> > > >> (xps_needed), so static_key_slow_inc() is used to switch branches. The
> > > >> definition of static_key_slow_inc() has cpus_read_lock in place. In the
> > > >> virtio_net driver, XPS queues are initialized after setting the
> > > >> queue:cpu affinity in virtnet_set_affinity() which is already protected
> > > >> within cpus_read_lock. Hence, the warning here trying to acquire
> > > >> cpus_read_lock when it is already held.
> > > >>
> > > >> A quick fix for this would be to just extract netif_set_xps_queue() out
> > > >> of the lock by simply wrapping it with another put/get_online_cpus
> > > >> (unlock right before and hold lock right after).
> > > > 
> > > > virtnet_set_affinity() is called from virtnet_cpu_online(), which is
> > > > called under cpus_read_lock too.
> > > > 
> > > > It looks like now we can't call netif_set_xps_queue() from cpu hotplug
> > > > callbacks.
> > > > 
> > > > I can suggest a very straightforward fix for this problem. The patch is
> > > > attached.
> > > > 
> > > 
> > > Thanks for looking into this. I was thinking of fixing this in the
> > > virtio_net driver by moving the XPS initialization (and have a new
> > > get_affinity utility) in the ndo_open (so it is together with other tx
> > > preparation) instead of probe. Your patch solves this in general for
> > > setting up cpu hotplug callbacks which is under cpus_read_lock.
> > 
> > 
> > I like this too. Could you repost in a standard way
> > (inline, with your signoff etc) so we can ack this for
> > net-next?
> 
> When I was testing this patch, I got the following kasan warning. Michael,
> could you take a look at it. Maybe you will understand what was going wrong there.
> 
> https://api.travis-ci.org/v3/job/410701353/log.txt
> 
> [    7.275033] ==================================================================
> [    7.275226] BUG: KASAN: slab-out-of-bounds in virtnet_poll+0xaa1/0xd00
> [    7.275359] Read of size 8 at addr ffff8801d444a000 by task ip/370
> [    7.275488] 
> [    7.275610] CPU: 1 PID: 370 Comm: ip Not tainted 4.18.0-rc6+ #1
> [    7.275613] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [    7.275616] Call Trace:
> [    7.275621]  <IRQ>
> [    7.275630]  dump_stack+0x71/0xab
> [    7.275640]  print_address_description+0x6a/0x270
> [    7.275648]  kasan_report+0x258/0x380
> [    7.275653]  ? virtnet_poll+0xaa1/0xd00
> [    7.275661]  virtnet_poll+0xaa1/0xd00
> [    7.275680]  ? receive_buf+0x5920/0x5920
> [    7.275689]  ? do_raw_spin_unlock+0x54/0x220
> [    7.275699]  ? find_held_lock+0x32/0x1c0
> [    7.275710]  ? rcu_process_callbacks+0xa60/0xd20
> [    7.275736]  net_rx_action+0x2ee/0xad0
> [    7.275748]  ? rcu_note_context_switch+0x320/0x320
> [    7.275754]  ? napi_complete_done+0x300/0x300
> [    7.275763]  ? native_apic_msr_write+0x27/0x30
> [    7.275768]  ? lapic_next_event+0x5b/0x90
> [    7.275775]  ? clockevents_program_event+0x21d/0x2f0
> [    7.275791]  __do_softirq+0x19a/0x623
> [    7.275807]  do_softirq_own_stack+0x2a/0x40
> [    7.275811]  </IRQ>
> [    7.275818]  do_softirq.part.18+0x6a/0x80
> [    7.275825]  __local_bh_enable_ip+0x49/0x50
> [    7.275829]  virtnet_open+0x129/0x440
> [    7.275841]  __dev_open+0x189/0x2c0
> [    7.275848]  ? dev_set_rx_mode+0x30/0x30
> [    7.275857]  ? do_raw_spin_unlock+0x54/0x220
> [    7.275866]  __dev_change_flags+0x3a9/0x4f0
> [    7.275873]  ? dev_set_allmulti+0x10/0x10
> [    7.275889]  dev_change_flags+0x7a/0x150
> [    7.275900]  do_setlink+0x9fe/0x2e40
> [    7.275910]  ? deref_stack_reg+0xad/0xe0
> [    7.275917]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
> [    7.275922]  ? find_held_lock+0x32/0x1c0
> [    7.275929]  ? rtnetlink_put_metrics+0x460/0x460
> [    7.275935]  ? virtqueue_add_sgs+0x9e2/0xde0
> [    7.275953]  ? virtscsi_add_cmd+0x454/0x780
> [    7.275964]  ? find_held_lock+0x32/0x1c0
> [    7.275973]  ? deref_stack_reg+0xad/0xe0
> [    7.275979]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
> [    7.275985]  ? lock_downgrade+0x5e0/0x5e0
> [    7.275993]  ? memset+0x1f/0x40
> [    7.276008]  ? nla_parse+0x33/0x290
> [    7.276016]  rtnl_newlink+0x954/0x1120
> [    7.276030]  ? rtnl_link_unregister+0x250/0x250
> [    7.276044]  ? is_bpf_text_address+0x5/0x60
> [    7.276054]  ? lock_downgrade+0x5e0/0x5e0
> [    7.276057]  ? lock_acquire+0x10b/0x2a0
> [    7.276072]  ? deref_stack_reg+0xad/0xe0
> [    7.276078]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
> [    7.276085]  ? __kernel_text_address+0xe/0x30
> [    7.276090]  ? unwind_get_return_address+0x5f/0xa0
> [    7.276103]  ? find_held_lock+0x32/0x1c0
> [    7.276110]  ? is_bpf_text_address+0x5/0x60
> [    7.276124]  ? deref_stack_reg+0xad/0xe0
> [    7.276131]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
> [    7.276136]  ? depot_save_stack+0x2d9/0x460
> [    7.276142]  ? deref_stack_reg+0xad/0xe0
> [    7.276156]  ? find_held_lock+0x32/0x1c0
> [    7.276164]  ? is_bpf_text_address+0x5/0x60
> [    7.276170]  ? __lock_acquire.isra.29+0xe8/0x1bb0
> [    7.276212]  ? rtnetlink_rcv_msg+0x5d6/0x930
> [    7.276222]  ? lock_downgrade+0x5e0/0x5e0
> [    7.276226]  ? lock_acquire+0x10b/0x2a0
> [    7.276240]  rtnetlink_rcv_msg+0x69e/0x930
> [    7.276249]  ? rtnl_calcit.isra.31+0x2f0/0x2f0
> [    7.276255]  ? __lock_acquire.isra.29+0xe8/0x1bb0
> [    7.276265]  ? netlink_deliver_tap+0x8d/0x8e0
> [    7.276276]  netlink_rcv_skb+0x127/0x350
> [    7.276281]  ? rtnl_calcit.isra.31+0x2f0/0x2f0
> [    7.276289]  ? netlink_ack+0x970/0x970
> [    7.276299]  ? __alloc_skb+0xc2/0x520
> [    7.276311]  netlink_unicast+0x40f/0x5d0
> [    7.276320]  ? netlink_attachskb+0x580/0x580
> [    7.276325]  ? _copy_from_iter_full+0x157/0x6f0
> [    7.276331]  ? import_iovec+0x90/0x390
> [    7.276338]  ? get_page_from_freelist+0x1e89/0x3e30
> [    7.276347]  ? apparmor_socket_sock_rcv_skb+0x10/0x10
> [    7.276357]  netlink_sendmsg+0x65e/0xb00
> [    7.276367]  ? netlink_unicast+0x5d0/0x5d0
> [    7.276373]  ? copy_msghdr_from_user+0x206/0x340
> [    7.276388]  ? netlink_unicast+0x5d0/0x5d0
> [    7.276394]  sock_sendmsg+0xb3/0xf0
> [    7.276401]  ___sys_sendmsg+0x604/0x8b0
> [    7.276410]  ? copy_msghdr_from_user+0x340/0x340
> [    7.276416]  ? find_held_lock+0x32/0x1c0
> [    7.276424]  ? __handle_mm_fault+0xc85/0x3140
> [    7.276433]  ? lock_downgrade+0x5e0/0x5e0
> [    7.276439]  ? mem_cgroup_commit_charge+0xb4/0xf80
> [    7.276453]  ? _raw_spin_unlock+0x24/0x30
> [    7.276458]  ? __handle_mm_fault+0xc85/0x3140
> [    7.276467]  ? __pmd_alloc+0x430/0x430
> [    7.276473]  ? find_held_lock+0x32/0x1c0
> [    7.276485]  ? __fget_light+0x55/0x1f0
> [    7.276497]  ? __sys_sendmsg+0xd2/0x170
> [    7.276502]  __sys_sendmsg+0xd2/0x170
> [    7.276508]  ? __ia32_sys_shutdown+0x70/0x70
> [    7.276516]  ? handle_mm_fault+0x1f9/0x6a0
> [    7.276528]  ? up_read+0x1c/0x110
> [    7.276534]  ? __do_page_fault+0x4a6/0xa80
> [    7.276554]  do_syscall_64+0xa0/0x280
> [    7.276558]  ? prepare_exit_to_usermode+0x88/0x130
> [    7.276566]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [    7.276572] RIP: 0033:0x7ffbe9a2f160
> [    7.276574] Code: c3 48 8b 05 2a 2d 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 1d 8f 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee cb 00 00 48 89 04 24 
> [    7.276728] RSP: 002b:00007ffc5a2d6108 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [    7.276735] RAX: ffffffffffffffda RBX: 00007ffc5a2da220 RCX: 00007ffbe9a2f160
> [    7.276738] RDX: 0000000000000000 RSI: 00007ffc5a2d6150 RDI: 0000000000000003
> [    7.276741] RBP: 00007ffc5a2d6150 R08: 0000000000000000 R09: 0000000000000003
> [    7.276744] R10: 00007ffc5a2d5ed0 R11: 0000000000000246 R12: 000000005b6177de
> [    7.276748] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffc5a2da918
> [    7.276763] 
> [    7.276895] Allocated by task 1:
> [    7.277026]  kasan_kmalloc+0xa0/0xd0
> [    7.277030]  __kmalloc+0x13a/0x250
> [    7.277034]  init_vqs+0xd0/0x11c0
> [    7.277038]  virtnet_probe+0xb99/0x1ad0
> [    7.277045]  virtio_dev_probe+0x3fc/0x890
> [    7.277052]  driver_probe_device+0x6c4/0xcc0
> [    7.277056]  __driver_attach+0x232/0x2c0
> [    7.277060]  bus_for_each_dev+0x118/0x1a0
> [    7.277064]  bus_add_driver+0x390/0x6e0
> [    7.277068]  driver_register+0x18e/0x400
> [    7.277076]  virtio_net_driver_init+0x6d/0x90
> [    7.277080]  do_one_initcall+0xa8/0x348
> [    7.277085]  kernel_init_freeable+0x42d/0x4c8
> [    7.277090]  kernel_init+0xf/0x130
> [    7.277095]  ret_from_fork+0x35/0x40
> [    7.277097] 
> [    7.277223] Freed by task 0:
> [    7.277347] (stack is not available)
> [    7.277473] 
> [    7.277596] The buggy address belongs to the object at ffff8801d4449100
> [    7.277596]  which belongs to the cache kmalloc-4096 of size 4096
> [    7.277769] The buggy address is located 3840 bytes inside of
> [    7.277769]  4096-byte region [ffff8801d4449100, ffff8801d444a100)
> [    7.277932] The buggy address belongs to the page:
> [    7.278066] page:ffffea0007511200 count:1 mapcount:0 mapping:ffff8801db002600 index:0x0 compound_mapcount: 0
> [    7.278230] flags: 0x17fff8000008100(slab|head)
> [    7.278363] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801db002600
> [    7.278516] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> [    7.278664] page dumped because: kasan: bad access detected
> [    7.278790] 
> [    7.278904] Memory state around the buggy address:
> [    7.279031]  ffff8801d4449f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    7.279175]  ffff8801d4449f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    7.279323] >ffff8801d444a000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    7.279468]                    ^
> [    7.279584]  ffff8801d444a080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    7.279729]  ffff8801d444a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [    7.279870] ==================================================================
> [    7.280011] Disabling lock debugging due to kernel taint
> [    7.632219] random: mktemp: uninitialized urandom read (6 bytes read)
> [    8.052850] random: mktemp: uninitialized urandom read (6 bytes read)
> [    8.335209] random: cloud-init: uninitialized urandom read (32 bytes read)
> [    8.796331] random: cloud-init: uninitialized urandom read (32 bytes read)
> [    9.162551] random: mktemp: uninitialized urandom read (12 bytes read)
> [    9.384504] random: ssh-keygen: uninitialized urandom read (32 bytes read)
> [    9.839586] init: failsafe main process (724) killed by TERM signal
> [   14.865443] postgres (1245): /proc/1245/oom_adj is deprecated, please use /proc/1245/oom_score_adj instead.
> [   17.213418] random: crng init done
> [   17.580892] init: plymouth-upstart-bridge main process ended, respawning

I suspect an off by one somewhere. I'm looking at the patch
and I don't see it but these things are hard to spot sometimes ...

> > 
> > > >> But this may not a
> > > >> clean solution. It'd help if I can get suggestions on what would be a
> > > >> clean option to fix this without extensively changing the code in
> > > >> virtio_net. Is it mandatory to protect the affinitization with
> > > >> read_lock? I don't see similar lock in other drivers while setting the
> > > >> affinity. I understand this warning should go away, but isn't it safe to
> > > >> have multiple readers.
> > > >>
> > > >>> On Fri, Jun 29, 2018 at 09:27:07PM -0700, Amritha Nambiar wrote: