linux-kernel - Re: [PATCH] net/9p/trans_virtio.c: replace mutex_lock with spin_lock to protect 'virtio_chan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180719033608.GA29510@nautica>
Date:   Thu, 19 Jul 2018 05:36:08 +0200
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     piaojun <piaojun@...wei.com>
Cc:     "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        Eric Van Hensbergen <ericvh@...il.com>,
        Ron Minnich <rminnich@...dia.gov>,
        Latchesar Ionkov <lucho@...kov.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        v9fs-developer@...ts.sourceforge.net
Subject: Re: [PATCH] net/9p/trans_virtio.c: replace mutex_lock with spin_lock
 to protect 'virtio_chan_list'

piaojun wrote on Thu, Jul 19, 2018:
> > piaojun wrote on Wed, Jul 18, 2018:
> > That's not a fast path operation, I don't mind changing things but I'd
> > like to understand why - these functions are only ever called at unmount
> > time or when something happens on the virtio bus (probe will happen on
> > probing on the pci bus and I'm not too sure on remove but probably pci
> > removal i.e. basically never?)
> > 
> > I don't see why this wouldn't work, but I won't take this without a
> > (good?) reason.
> > 
> virtio_9p_lock is responsable for protecting virtio_chan_list which has 3
> operation:
> 
> 1. Add a virtio chan to virtio_chan_list. This will happen when we insmod
> 9pnet_virtio.ko:
> p9_virtio_probe
> --list_add_tail(&chan->chan_list, &virtio_chan_list);
> 
> 2. Remove a virtio chan. This will happen when remnod 9pnet_virtio.ko:
> p9_virtio_remove
> --list_del(&chan->chan_list);
> 
> 3. Find a unused virtio chan when mount 9p:
> mount
> --p9_virtio_create
> --list_for_each_entry(chan, &virtio_chan_list, chan_list)
> 
> Multi mount process will compete for virtio_9p_lock when finding unused
> virtio chan, in which case mutex lock will cause process sleep and wake
> up. I think this a waste of CPU time. So we could use spin lock to avoid
> this.

Well, sure, that's theory; but how is that in practice?
I actually took the time to run some tests, setting up 20 virtio mount
points in qemu, and running this command with and without your patch:
# time sh -c 'for i in {1..20}; do
  sh -c "for j in {1..100}; do
    mount -t 9p d$i d.$i;
    umount d.$i;
  done" &
  done;
  wait'

This is quick & dirty but basically, mounts and unmounts 100 times in a
loop all 20 mount points in parallel to stress that lock.
I get these times 5 times (one run per column),
without patch:
real	0m19.357s	0m19.626s	0m19.904s	0m19.926s	0m21.321s
user	0m6.795s	0m6.874s	0m6.807s	0m6.768s	0m6.892s
sys		0m29.936s	0m31.196s	0m31.702s	0m31.914s	0m30.791s

With patch:
real	0m19.439s	0m19.849s	0m19.683s	0m19.600s	0m20.689s
user	0m6.948s	0m6.582s	0m6.706s	0m6.598s	0m6.876s
sys		0m29.364s	0m30.898s	0m30.695s	0m31.311s	0m33.391s

I honestly can't say I'm convinced with a difference either way, the
variations look more like noise than anything to me.


More to the point, while these tests ran my dmesg buffer was filled with
errors like:
FS-Cache: Duplicate cookie detected
FS-Cache: O-cookie c=0000000000368cdb [p=00000000548b03c2 fl=222 nc=0 na=1]
FS-Cache: O-cookie d=000000004cebd15f n=00000000029a0b83
FS-Cache: O-key=[10] '34323935303838343536'
FS-Cache: N-cookie c=00000000d4089478 [p=00000000548b03c2 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=000000004cebd15f n=00000000959d4d37
FS-Cache: N-key=[10] '34323935303838343536'

or
(output mangled a bit)

==================================================================
BUG: KASAN: use-after-free in p9_client_cb+0x14d/0x160 [9pnet]
Read of size 8 at addr ffff88003522a088 by task systemd-udevd/492

CPU: 1 PID: 492 Comm: systemd-udevd Tainted: G           O      4.18.0-rc5+ #9
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 0>
Call Trace:
 <IRQ>
 dump_stack+0x7b/0xad
 print_address_description+0x6a/0x209
 ? p9_client_cb+0x14d/0x160 [9pnet]
 kasan_report.cold.7+0x242/0x2fe
 __asan_report_load8_noabort+0x19/0x20
 p9_client_cb+0x14d/0x160 [9pnet]
 req_done+0x22f/0x280 [9pnet_virtio]
 ? p9_mount_tag_show+0x120/0x120 [9pnet_virtio]
 vring_interrupt+0x108/0x1b0 [virtio_ring]
 ? vring_map_single.constprop.23+0x350/0x350 [virtio_ring]
 __handle_irq_event_percpu+0xec/0x460
 handle_irq_event_percpu+0x71/0x140
 ? __handle_irq_event_percpu+0x460/0x460
 ? apic_ack_irq+0xa3/0xe0
 handle_irq_event+0xb9/0x14a
 handle_edge_irq+0x1ea/0x7a0
 ? kasan_check_read+0x11/0x20
 handle_irq+0x48/0x60
 do_IRQ+0x67/0x140
 common_interrupt+0xf/0xf
 </IRQ>
RIP: 0010:finish_task_switch+0x10e/0x630
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d 04 00 00 41 c7 45 38 00 00 00 00 4c 89 e7 ff 14 25 28 f5 66 8e fb 66 0f >
RSP: 0018:ffff8800633e7a60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd4
RAX: 0000000000000001 RBX: ffff880036632000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006caaac00
RBP: ffff8800633e7aa0 R08: ffffed000cea15cd R09: ffffed000cea15cc
R10: ffffed000cea15cc R11: ffff88006750ae63 R12: ffff88006caaac00
R13: ffff88006558b000 R14: 0000000000000000 R15: ffff880036632000
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 __schedule+0x733/0x1c10
 ? __bpf_prog_run64+0xd0/0xd0
 ? firmware_map_remove+0x174/0x174
 schedule+0x7a/0x1a0
 schedule_hrtimeout_range_clock+0x306/0x3b0
 ? kasan_check_write+0x14/0x20
 ? hrtimer_nanosleep_restart+0x290/0x290
 ? ep_busy_loop_end+0x110/0x110
 schedule_hrtimeout_range+0x13/0x20
 ep_poll+0x7a7/0xb50
 ? __ia32_sys_epoll_ctl+0x1170/0x1170
 ? __fget_light+0x59/0x1f0
 ? __audit_syscall_entry+0x347/0x980
 ? __audit_free+0x8a0/0x8a0
34
 ? wake_up_q+0x100/0x100
39
 ? kasan_check_read+0x11/0x20
3230373130'
FS-Cache: O-key=[10] '34323934393230373131'
FS-Cache: N-cookie c=00000000fa69c1f9 [p=00000000887326c4 fl=2 nc=0 na=1]
FS-Cache: N-cookie d=00000000a8f143d1 n=00000000446f741a
FS-Cache: N-key=[10] '34323934393230373131'
 ? __fget_light+0x59/0x1f0
 do_epoll_wait+0x129/0x160
 __x64_sys_epoll_wait+0x97/0xf0
 do_syscall_64+0xa5/0x260
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f9099a22317
Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 d1 46 2c 00 41 89 ca 8b 00 85 c0 75 10 b8 e8 00 >
RSP: 002b:00007ffff67e1f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
RAX: ffffffffffffffda RBX: 0000558182d9e390 RCX: 00007f9099a22317
RDX: 000000000000000b RSI: 00007ffff67e1f30 RDI: 000000000000000b
RBP: 00007ffff67e20b0 R08: 0000000006c65ded R09: 00007ffff67e1f30
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffff67e1f30 R14: ffffffffffffffff R15: 0000558182d7a4c0

Allocated by task 6390:
 save_stack+0x43/0xd0
 kasan_kmalloc+0xc4/0xe0
 kasan_slab_alloc+0x12/0x20
 kmem_cache_alloc+0xe2/0x5e0
 p9_client_prepare_req+0xa4/0x670 [9pnet]
 p9_client_rpc+0x133/0xd20 [9pnet]
 p9_client_getattr_dotl+0x102/0x910 [9pnet]
 v9fs_mount+0x5a6/0x7c0 [9p]
 mount_fs+0x89/0x2ad
 vfs_kern_mount.part.32+0x5d/0x390
 do_mount+0x379/0x2bb0
 ksys_mount+0xbf/0xe0
 __x64_sys_mount+0xbe/0x150
 do_syscall_64+0xa5/0x260
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 6390:
 save_stack+0x43/0xd0
 __kasan_slab_free+0x118/0x170
 kasan_slab_free+0xe/0x10
 kmem_cache_free+0x49/0x160
 p9_free_req+0x106/0x140 [9pnet]
 p9_client_getattr_dotl+0x590/0x910 [9pnet]
 v9fs_mount+0x5a6/0x7c0 [9p]
 mount_fs+0x89/0x2ad
 vfs_kern_mount.part.32+0x5d/0x390
 do_mount+0x379/0x2bb0
 ksys_mount+0xbf/0xe0
 __x64_sys_mount+0xbe/0x150
 do_syscall_64+0xa5/0x260
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff88003522a068
 which belongs to the cache p9_req_t of size 72
The buggy address is located 32 bytes inside of
 72-byte region [ffff88003522a068, ffff88003522a0b0)
The buggy address belongs to the page:
page:ffffea0000d48a80 count:1 mapcount:0 mapping:ffff880064562580 index:0x0
flags: 0xffffc000000100(slab)
raw: 00ffffc000000100 ffff880035e36618 ffffea00019fa888 ffff880064562580
raw: 0000000000000000 ffff88003522a000 0000000100000027 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff880035229f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88003522a000: fb fb fb fb fb fb fb fb fb fc fc fc fc fb fb fb
>ffff88003522a080: fb fb fb fb fb fb fc fc fc fc fb fb fb fb fb fb
                      ^
 ffff88003522a100: fb fb fb fc fc fc fc fb fb fb fb fb fb fb fb fb
 ffff88003522a180: fc fc fc fc fb fb fb fb fb fb fb fb fb fc fc fc
==================================================================

so if you're concerned about parallel mountings, I think there are
others, more important, bugs to fix rather than replacing a hardly-used
mutex by a spin-lock...



You've done the work now so it's not like I can't take the patch, but it
really feels pointless to me unless you can show me there is actual
improvement.

-- 
Dominique