[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <681B8DB8-0B1E-43FA-B28B-95D1B007D158@dilger.ca>
Date: Fri, 22 Nov 2019 16:46:59 -0700
From: Andreas Dilger <adilger@...ger.ca>
To: "Darrick J. Wong" <darrick.wong@...cle.com>
Cc: Theodore Ts'o <tytso@....edu>,
Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: simulate various I/O and checksum errors when
reading metadata
On Nov 21, 2019, at 5:09 PM, Darrick J. Wong <darrick.wong@...cle.com> wrote:
>
> On Thu, Nov 21, 2019 at 01:30:36PM -0500, Theodore Ts'o wrote:
>> This allows us to test various error handling code paths
>>
>> Signed-off-by: Theodore Ts'o <tytso@....edu>
>> ---
>> fs/ext4/balloc.c | 4 +++-
>> fs/ext4/ext4.h | 42 ++++++++++++++++++++++++++++++++++++++++++
>> fs/ext4/ialloc.c | 4 +++-
>> fs/ext4/inode.c | 6 +++++-
>> fs/ext4/namei.c | 11 ++++++++---
>> fs/ext4/sysfs.c | 23 +++++++++++++++++++++++
>> 6 files changed, 84 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
>> index 102c38527a10..5f993a411251 100644
>> --- a/fs/ext4/balloc.c
>> +++ b/fs/ext4/balloc.c
>> @@ -371,7 +371,8 @@ static int ext4_validate_block_bitmap(struct super_block *sb,
>> if (buffer_verified(bh))
>> goto verified;
>> if (unlikely(!ext4_block_bitmap_csum_verify(sb, block_group,
>> - desc, bh))) {
>> + desc, bh) ||
>> + ext4_simulate_fail(sb, EXT4_SIM_BBITMAP_CRC))) {
>> ext4_unlock_group(sb, block_group);
>> ext4_error(sb, "bg %u: bad block bitmap checksum", block_group);
>> ext4_mark_group_bitmap_corrupted(sb, block_group,
>> @@ -505,6 +506,7 @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group,
>> if (!desc)
>> return -EFSCORRUPTED;
>> wait_on_buffer(bh);
>> + ext4_simulate_fail_bh(sb, bh, EXT4_SIM_BBITMAP_EIO);
>> if (!buffer_uptodate(bh)) {
>> ext4_set_errno(sb, EIO);
>> ext4_error(sb, "Cannot read block bitmap - "
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 1c9ac0fc8715..e6798db4634c 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1557,6 +1557,9 @@ struct ext4_sb_info {
>> /* Barrier between changing inodes' journal flags and writepages ops. */
>> struct percpu_rw_semaphore s_journal_flag_rwsem;
>> struct dax_device *s_daxdev;
>> +#ifdef CONFIG_EXT4_DEBUG
>> + unsigned long s_simulate_fail;
>> +#endif
>> };
>>
>> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> @@ -1575,6 +1578,45 @@ static inline int ext4_valid_inum(struct super_block *sb, unsigned long ino)
>> ino <= le32_to_cpu(EXT4_SB(sb)->s_es->s_inodes_count));
>> }
>>
>> +static inline int ext4_simulate_fail(struct super_block *sb,
>> + unsigned long flag)
>
> Nit: bool?
>
>> +{
>> +#ifdef CONFIG_EXT4_DEBUG
>> + unsigned long old, new;
>> + struct ext4_sb_info *sbi = EXT4_SB(sb);
>> +
>> + do {
>> + old = READ_ONCE(sbi->s_simulate_fail);
>> + if (likely((old & flag) == 0))
>> + return 0;
>> + new = old & ~flag;
>> + } while (unlikely(cmpxchg(&sbi->s_simulate_fail, old, new) != old));
>
> If I'm reading this correctly, this means that userspace sets a
> s_simulate_fail bit via sysfs knob, and the next time the filesystem
> calls ext4_simulate_fail with the same bit set in @flag we'll return
> true to say "simulate the failure" and clear the bit in s_simulate_fail?
>
> IOWs, the simulated failures have to be re-armed every time?
>
> Seems reasonable, but consider the possibility that in the future it
> might be useful if you could set up periodic failures (e.g. directory
> lookups fail 10% of the time) so that you can see how something like
> fsstress reacts to less-predictable failures?
>
> Of course that also increases the amount of fugly sysfs boilerplate so
> that each knob can have its own sysfs file... that alone is half of a
> reason not to do that. :(
Just for comparison, Lustre has had a fault injection mechanism for ages
that can do a bunch of things like this. Each fault location has a unique
number (we separate them by subsystems in the code, but numbers are rather
arbitrary), and then a sysfs parameter "fail_loc" that can be set to match
the fault location to inject errors, and "fail_val" that allows userspace
to adjust/tune the failure behavior (e.g. only affect target N, or sleep N
seconds, ...).
The low 16 bits set in fail_loc is the fault location number, and the high
16 bits of fail_loc are flags that can modify the behavior independent of
which failure number is being used:
- CFS_FAIL_ONCE: the fail_loc should only fail once (default forever)
- CFS_FAIL_SKIP: skip the fail_loc the first "fail_val" times
- CFS_FAIL_SOME: trigger the failure the first "fail_val" times
- CFS_FAIL_RAND: trigger the failure at a rate of 1/fail_val
There are also flags set by the kernel when the failure is hit, so it is
possible to read fail_loc in a test script if the failure was already hit.
Internally in the code, the most common use is just checking if we hit the
currently-set fail_loc (which is unlikely() for minimal impact), like:
if (CFS_FAIL_CHECK(OBD_FAIL_TGT_REPLAY_RECONNECT))
RETURN(1); /* don't send early reply */
if (CFS_FAIL_CHECK(OBD_FAIL_FLD_QUERY_REQ) && req->rq_no_delay) {
/* the same error returned by ptlrpc_import_delay_req() */
rc = -EWOULDBLOCK;
req->rq_status = rc;
It is possible to inject a delay into a thread to allow something else to
happen (maybe more useful for a distributed system rather than a local one):
CFS_FAIL_TIMEOUT(OBD_FAIL_TGT_REPLAY_DELAY2, cfs_fail_val);
It is also possible to set up a race between two threads in the same or
different parts of the code on the same node:
CFS_RACE(CFS_FAIL_CHLOG_USER_REG_UNREG_RACE);
The first thread to hit this fail_loc will sleep, and the second thread
that hits it will wake it up. There is a variation of this to make it
explicit that only one thread to hit a location should sleep, and a
second thread needs to hit a different location to wake it up:
thread1:
CFS_RACE_WAIT(OBD_FAIL_OBD_ZERO_NLINK_RACE);
thread2:
CFS_RACE_WAKEUP(OBD_FAIL_OBD_ZERO_NLINK_RACE);
It is also possible to daisy-chain failure conditions:
if (ns_is_client(ldlm_lock_to_ns(lock)) &&
CFS_FAIL_CHECK_RESET(OBD_FAIL_LDLM_INTR_CP_AST,
OBD_FAIL_LDLM_CP_BL_RACE | OBD_FAIL_ONCE))
ldlm_set_fail_loc(lock);
so here if "OBD_FAIL_LDLM_INTR_CP_AST" is hit, it will reset fail_loc to be "OBD_FAIL_LDLM_CP_BL_RACE" to fail once, flag a particular DLM lock and then
the next two threads that access this lock will race to process it:
if (ldlm_is_fail_loc(lock))
CFS_RACE(OBD_FAIL_LDLM_CP_BL_RACE);
The CFS_FAIL functionality has been used for quite a few years and has
proven sufficient and easily used to invoke failure conditions that would
be otherwise impossible to reproduce (over a thousand fault injection
sites in the Lustre code and corresponding tests to trigger them today).
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists