[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190410193418.GA32402@roeck-us.net>
Date: Wed, 10 Apr 2019 12:34:18 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: NeilBrown <neilb@...e.com>
Cc: Thomas Graf <tgraf@...g.ch>,
Herbert Xu <herbert@...dor.apana.org.au>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash
bucket.
Hi,
On Tue, Apr 02, 2019 at 10:07:45AM +1100, NeilBrown wrote:
> This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
> bucket pointer to lock the hash chain for that bucket.
>
> The benefits of a bit spin_lock are:
> - no need to allocate a separate array of locks.
> - no need to have a configuration option to guide the
> choice of the size of this array
> - locking cost is often a single test-and-set in a cache line
> that will have to be loaded anyway. When inserting at, or removing
> from, the head of the chain, the unlock is free - writing the new
> address in the bucket head implicitly clears the lock bit.
> For __rhashtable_insert_fast() we ensure this always happens
> when adding a new key.
> - even when lockings costs 2 updates (lock and unlock), they are
> in a cacheline that needs to be read anyway.
>
> The cost of using a bit spin_lock is a little bit of code complexity,
> which I think is quite manageable.
>
> Bit spin_locks are sometimes inappropriate because they are not fair -
> if multiple CPUs repeatedly contend of the same lock, one CPU can
> easily be starved. This is not a credible situation with rhashtable.
> Multiple CPUs may want to repeatedly add or remove objects, but they
> will typically do so at different buckets, so they will attempt to
> acquire different locks.
>
> As we have more bit-locks than we previously had spinlocks (by at
> least a factor of two) we can expect slightly less contention to
> go with the slightly better cache behavior and reduced memory
> consumption.
>
> To enhance type checking, a new struct is introduced to represent the
> pointer plus lock-bit
> that is stored in the bucket-table. This is "struct rhash_lock_head"
> and is empty. A pointer to this needs to be cast to either an
> unsigned lock, or a "struct rhash_head *" to be useful.
> Variables of this type are most often called "bkt".
>
> Previously "pprev" would sometimes point to a bucket, and sometimes a
> ->next pointer in an rhash_head. As these are now different types,
> pprev is NULL when it would have pointed to the bucket. In that case,
> 'blk' is used, together with correct locking protocol.
>
> Signed-off-by: NeilBrown <neilb@...e.com>
This patch causes my qemu q800 boot test to crash reliably.
Starting network: Unable to handle kernel access at virtual address (ptrval)
Oops: 00000000
Modules linked in:
PC: [<00248b90>] sk_filter_trim_cap+0x56/0x158
SR: 2000 SP: (ptrval) a2: 07a30aa0
d0: 07836300 d1: 0783666c d2: 00000001 d3: 0025c192
d4: 0025a636 d5: 00248b3a a0: 078363fe a1: 60000000
Process ip (pid: 66, task=(ptrval))
Frame format=7 eff addr=6000000c ssw=0505 faddr=6000000c
wb 1 stat/addr/data: 0000 00000000 00000000
wb 2 stat/addr/data: 0000 00000000 00000000
wb 3 stat/addr/data: 0000 6000000c 00000000
push data: 00000000 00000000 00000000 00000000
Stack from 07a5bdec:
07a5be5c 0025c192 0025a636 00248b3a 00000000 ef9febc8 078363fe 0787a2d0
0783645c 07a5be5c 0025c192 0025a636 00248b3a 0787a2d0 07a2c000 0025c470
078363fe 0787a2d0 00000001 00000000 00000000 07a5be98 07a17500 00000000
0787a2d0 07a2c000 07a5bef8 07a5beac 7fffffff 0025c7e2 07a2c000 0787a2d0
00000000 00000000 00000000 0000000c ef9feb14 ef9feb14 00000000 0031a52e
07a5bef8 07a5bf28 0000000c 0781eba0 00000000 00000042 00000000 00000000
Call Trace: [<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<0025c470>] netlink_unicast+0x170/0x1be
[<0025c7e2>] netlink_sendmsg+0x288/0x2b2
[<0021a5be>] sock_sendmsg+0x1c/0x44
[<0021b8a6>] __sys_sendto+0xac/0xd2
[<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
[<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
[<00219906>] sock_alloc_file+0x50/0x80
[<000b7a1c>] fd_install+0x12/0x18
[<0021b1cc>] __sys_socket+0x7e/0x9c
[<0021b8ea>] sys_sendto+0x1e/0x24
[<00002a40>] syscall+0x8/0xc
[<0000c004>] ATANTBL+0x618/0x800
Code: 4a89 6604 4280 60ea 2c2b 000c 2748 000c <2869> 000c 082c 0003 0002 6728 4878 0014 7620 4873 3800 486e ffec 4eb9 002e 5b88
Disabling lock debugging due to kernel taint
Unable to handle kernel NULL pointer dereference at virtual address (ptrval)
Oops: 00000000
Modules linked in:
PC: [<0025bf9c>] netlink_release+0x284/0x446
SR: 2000 SP: (ptrval) a2: 07a30aa0
d0: 00000780 d1: 00000780 d2: 07a2c26e d3: 0001a0ba
d4: 07406160 d5: 000000ac a0: 078094ac a1: 00000780
Process ip (pid: 66, task=(ptrval))
Frame format=7 eff addr=00000780 ssw=0505 faddr=00000780
wb 1 stat/addr/data: 0000 00000000 00000000
wb 2 stat/addr/data: 0000 00000000 00000000
wb 3 stat/addr/data: 0000 00000780 00000000
push data: 00000000 00000000 00000000 00000000
Stack from 07a5bc2c:
00000000 078126a0 0741fe00 07a39208 00000001 ef9febc8 07406160 0740617c
00000000 00000000 002e8ca6 074061e8 07a5bcf4 00219830 07406160 00000008
07a39200 0740617c 002198a2 07406160 0740617c 000a3996 0740617c 07a39200
003deb54 07a39480 002e863c 000000b4 07a30aa0 002e7446 0002934a 07a39200
600000ff 00033e44 07a30aa0 0790f330 00018f34 07a30aa0 6000000c 0025c192
0025a636 00248b3a 00000001 ef9febc8 07a5bd84 00037986 0783645c 0790f368
Call Trace: [<002e8ca6>] __down_write+0xe/0x12
[<00219830>] __sock_release+0x34/0x98
[<002198a2>] sock_close+0xe/0x14
[<000a3996>] __fput+0xc4/0x16a
[<002e863c>] down_read+0x0/0x6
[<002e7446>] _cond_resched+0x0/0x2a
[<0002934a>] task_work_run+0x66/0x7c
[<00033e44>] up_read+0x0/0x6
[<00018f34>] do_exit+0x2a2/0x724
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<00037986>] printk+0x0/0x18
[<00004fd4>] die_if_kernel+0x52/0x56
[<00005d64>] send_fault_sig+0x74/0x86
[<00005094>] buserr_c+0xbc/0x518
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<00002944>] buserr+0x20/0x28
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<0025c192>] netlink_attachskb+0x0/0x138
[<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
[<00248b3a>] sk_filter_trim_cap+0x0/0x158
[<0025c470>] netlink_unicast+0x170/0x1be
[<0025c7e2>] netlink_sendmsg+0x288/0x2b2
[<0021a5be>] sock_sendmsg+0x1c/0x44
[<0021b8a6>] __sys_sendto+0xac/0xd2
[<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
[<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
[<00219906>] sock_alloc_file+0x50/0x80
[<000b7a1c>] fd_install+0x12/0x18
[<0021b1cc>] __sys_socket+0x7e/0x9c
[<0021b8ea>] sys_sendto+0x1e/0x24
[<00002a40>] syscall+0x8/0xc
[<0000c004>] ATANTBL+0x618/0x800
Code: 41f5 5800 6000 fe34 b082 670a 2200 2240 <2011> 6000 fe48 202b 026e 43f9 0001 a0ba 4a81 677a 2041 2080 4878 0200 2f3c 0025
Kernel panic - not syncing: Aiee, killing interrupt handler!
---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
qemu-system-m68k: terminating on signal 15 from pid 19836 (/bin/bash)
Bisect log is attached. Obviously I have no idea if this is an issue with qemu
or with the code. qemu is from git@...hub.com:vivier/qemu-m68k.git, branch
m68k/q800-dev.
Guenter
---
# bad: [87b81df1a63d6791359a475241fa9d2f96fa30be] Add linux-next specific files for 20190410
# good: [15ade5d2e7775667cf191cf2f94327a4889f8b9d] Linux 5.1-rc4
git bisect start 'HEAD' 'v5.1-rc4'
# bad: [35a1393e49d19d625d0270fcdf409fad743263cd] Merge remote-tracking branch 'crypto/master'
git bisect bad 35a1393e49d19d625d0270fcdf409fad743263cd
# good: [d1472e6a10485eb679eec89324ab165533af0be1] Merge remote-tracking branch 'hid/for-next'
git bisect good d1472e6a10485eb679eec89324ab165533af0be1
# bad: [0c9381d9bcfbd7dbc26b6e5f296e90d6396ea4db] Merge branch 'netdevsim-small-spring-cleanup'
git bisect bad 0c9381d9bcfbd7dbc26b6e5f296e90d6396ea4db
# good: [356d71e00d278d865f8c7f68adebd6ce4698a7e2] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good 356d71e00d278d865f8c7f68adebd6ce4698a7e2
# good: [fe1ec0bdfba4e7bfe6db81a1e4ac74beb36691e8] ehea: remove set but not used variables 'epa' and 'cq_handle_ref'
git bisect good fe1ec0bdfba4e7bfe6db81a1e4ac74beb36691e8
# bad: [ed514fc5615d7688b7c227a76863e98a92fb0d54] cxgb4: Don't return EAGAIN when TCAM is full.
git bisect bad ed514fc5615d7688b7c227a76863e98a92fb0d54
# good: [7090425104dbd87959bd424e9bd5a8711fde0986] net: phy: add amlogic g12a mdio mux support
git bisect good 7090425104dbd87959bd424e9bd5a8711fde0986
# good: [059477830022e1886f55a9641702461c249fa864] net: hsr: fix placement of logical operator in a multi-line statement
git bisect good 059477830022e1886f55a9641702461c249fa864
# good: [7a41c294c1463100fdc82a356e22e36bbaa6b0f9] rhashtable: use cmpxchg() in nested_table_alloc()
git bisect good 7a41c294c1463100fdc82a356e22e36bbaa6b0f9
# bad: [9186c90bbb9525f46eddb590be26c749b5b1def7] Merge branch 'rhashtable-bitlocks'
git bisect bad 9186c90bbb9525f46eddb590be26c749b5b1def7
# bad: [8f0db018006a421956965e1149234c4e8db718ee] rhashtable: use bit_spin_locks to protect hash bucket.
git bisect bad 8f0db018006a421956965e1149234c4e8db718ee
# good: [ff302db965b57c141297911ea647d36d11fedfbe] rhashtable: allow rht_bucket_var to return NULL.
git bisect good ff302db965b57c141297911ea647d36d11fedfbe
# first bad commit: [8f0db018006a421956965e1149234c4e8db718ee] rhashtable: use bit_spin_locks to protect hash bucket.
Powered by blists - more mailing lists