[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131123032205.GB20794@localhost>
Date: Sat, 23 Nov 2013 11:22:05 +0800
From: fengguang.wu@...el.com
To: "Li, Shaohua" <shaohua.li@...el.com>
Cc: NeilBrown <neilb@...e.de>, LKML <linux-kernel@...r.kernel.org>
Subject: kernel BUG at drivers/md/raid5.c:693!
Shaohua,
FYI, we are still seeing this bug.. dmesg attached.
566c09c53455d7c4f1130928ef8071da1a24ea65 is the first bad commit
commit 566c09c53455d7c4f1130928ef8071da1a24ea65
Author: Shaohua Li <shli@...nel.org>
Date: Thu Nov 14 15:16:17 2013 +1100
raid5: relieve lock contention in get_active_stripe()
get_active_stripe() is the last place we have lock contention. It has two
paths. One is stripe isn't found and new stripe is allocated, the other is
stripe is found.
The first path basically calls __find_stripe and init_stripe. It accesses
conf->generation, conf->previous_raid_disks, conf->raid_disks,
conf->prev_chunk_sectors, conf->chunk_sectors, conf->max_degraded,
conf->prev_algo, conf->algorithm, the stripe_hashtbl and inactive_list. Except
stripe_hashtbl and inactive_list, other fields are changed very rarely.
With this patch, we split inactive_list and add new hash locks. Each free
stripe belongs to a specific inactive list. Which inactive list is determined
by stripe's lock_hash. Note, even a stripe hasn't a sector assigned, it has a
lock_hash assigned. Stripe's inactive list is protected by a hash lock, which
is determined by it's lock_hash too. The lock_hash is derivied from current
stripe_hashtbl hash, which guarantees any stripe_hashtbl list will be assigned
to a specific lock_hash, so we can use new hash lock to protect stripe_hashtbl
list too. The goal of the new hash locks introduced is we can only use the new
locks in the first path of get_active_stripe(). Since we have several hash
locks, lock contention is relieved significantly.
The first path of get_active_stripe() accesses other fields, since they are
changed rarely, changing them now need take conf->device_lock and all hash
locks. For a slow path, this isn't a problem.
If we need lock device_lock and hash lock, we always lock hash lock first. The
tricky part is release_stripe and friends. We need take device_lock first.
Neil's suggestion is we put inactive stripes to a temporary list and readd it
to inactive_list after device_lock is released. In this way, we add stripes to
temporary list with device_lock hold and remove stripes from the list with hash
lock hold. So we don't allow concurrent access to the temporary list, which
means we need allocate temporary list for all participants of release_stripe.
One downside is free stripes are maintained in their inactive list, they can't
across between the lists. By default, we have total 256 stripes and 8 lists, so
each list will have 32 stripes. It's possible one list has free stripe but
other list hasn't. The chance should be rare because stripes allocation are
even distributed. And we can always allocate more stripes for cache, several
mega bytes memory isn't a big deal.
This completely removes the lock contention of the first path of
get_active_stripe(). It slows down the second code path a little bit though
because we now need takes two locks, but since the hash lock isn't contended,
the overhead should be quite small (several atomic instructions). The second
path of get_active_stripe() (basically sequential write or big request size
randwrite) still has lock contentions.
Signed-off-by: Shaohua Li <shli@...ionio.com>
Signed-off-by: NeilBrown <neilb@...e.de>
:040000 040000 88fa28d1decc5454cf4d58421fa3eb12bc9ad524 5d1f104188f72b17d10cd569bf2924dab5d789cb M drivers
bisect run success
# bad: [02ffe4cc90dce5a1bbee5daae98a40a431c29c6d] Merge 'yuanhan/slub-experimental' into devel-hourly-2013112217
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect start '02ffe4cc90dce5a1bbee5daae98a40a431c29c6d' '5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52' '--'
# good: [5cbb3d216e2041700231bcfc383ee5f8b7fc8b74] Merge branch 'akpm' (patches from Andrew Morton)
git bisect good 5cbb3d216e2041700231bcfc383ee5f8b7fc8b74
# good: [d5ceede8dc86278d16dcad8f916ef323b5672bd8] drivers/rtc/rtc-hid-sensor-time.c: use dev_get_platdata()
git bisect good d5ceede8dc86278d16dcad8f916ef323b5672bd8
# good: [ffd3c0260aeeb1fd4d36378d2e06e6410661dd0f] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good ffd3c0260aeeb1fd4d36378d2e06e6410661dd0f
# good: [1ee2dcc2245340cf4ac94b99c4d00efbeba61824] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good 1ee2dcc2245340cf4ac94b99c4d00efbeba61824
# bad: [a50f09550a7dd58b4514630fec1676855084ec1f] Merge 'vireshk/cpufreq-next' into devel-hourly-2013112217
git bisect bad a50f09550a7dd58b4514630fec1676855084ec1f
# bad: [af2e2f328052082f58f041d574ed50c7f21c598f] Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
git bisect bad af2e2f328052082f58f041d574ed50c7f21c598f
# good: [df12a3178d340319b1955be6b973a4eb84aff754] Merge commit 'dmaengine-3.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine
git bisect good df12a3178d340319b1955be6b973a4eb84aff754
# good: [ed6a82546d2e8f6b5902269541733814d4adacc2] Merge branch 'acpi-hotplug'
git bisect good ed6a82546d2e8f6b5902269541733814d4adacc2
# bad: [6d6e352c80f22c446d933ca8103e02bac1f09129] Merge tag 'md/3.13' of git://neil.brown.name/md
git bisect bad 6d6e352c80f22c446d933ca8103e02bac1f09129
# bad: [30b8feb730f9b9b3c5de02580897da03f59b6b16] md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
git bisect bad 30b8feb730f9b9b3c5de02580897da03f59b6b16
# bad: [566c09c53455d7c4f1130928ef8071da1a24ea65] raid5: relieve lock contention in get_active_stripe()
git bisect bad 566c09c53455d7c4f1130928ef8071da1a24ea65
# good: [02e5f5c0a0f726e66e3d8506ea1691e344277969] md: fix calculation of stacking limits on level change.
git bisect good 02e5f5c0a0f726e66e3d8506ea1691e344277969
# good: [82e06c811163c4d853ed335d56c3378088bc89cc] wait: add wait_event_cmd()
git bisect good 82e06c811163c4d853ed335d56c3378088bc89cc
# first bad commit: [566c09c53455d7c4f1130928ef8071da1a24ea65] raid5: relieve lock contention in get_active_stripe()
View attachment "dmesg" of type "text/plain" (66871 bytes)
Powered by blists - more mailing lists