[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <674aa21b-4c47-4586-abdc-5198840fcea5@oracle.com>
Date: Mon, 8 Sep 2025 15:27:57 +0100
From: John Garry <john.g.garry@...cle.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: Zorro Lang <zlang@...hat.com>, fstests@...r.kernel.org,
Ritesh Harjani <ritesh.list@...il.com>, djwong@...nel.org,
tytso@....edu, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v5 09/12] generic: Add sudden shutdown tests for multi
block atomic writes
On 05/09/2025 18:06, Ojaswin Mujoo wrote:
> On Tue, Sep 02, 2025 at 04:49:26PM +0100, John Garry wrote:
>> On 22/08/2025 09:02, Ojaswin Mujoo wrote:
>>> This test is intended to ensure that multi blocks atomic writes
>>> maintain atomic guarantees across sudden FS shutdowns.
>>>
>>> The way we work is that we lay out a file with random mix of written,
>>> unwritten and hole extents. Then we start performing atomic writes
>>> sequentially on the file while we parallely shutdown the FS. Then we
>>> note the last offset where the atomic write happened just before shut
>>> down and then make sure blocks around it either have completely old
>>> data or completely new data, ie the write was not torn during shutdown.
>>>
>>> We repeat the same with completely written, completely unwritten and completely
>>> empty file to ensure these cases are not torn either. Finally, we have a
>>> similar test for append atomic writes
>>>
>>> Suggested-by: Ritesh Harjani (IBM) <ritesh.list@...il.com>
>>> Reviewed-by: Darrick J. Wong <djwong@...nel.org>
>>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>>
>> Please check comments, below, thanks!
>>
>>> ---
>>> tests/generic/1230 | 397 +++++++++++++++++++++++++++++++++++++++++
>>> tests/generic/1230.out | 2 +
>>> 2 files changed, 399 insertions(+)
>>> create mode 100755 tests/generic/1230
>>> create mode 100644 tests/generic/1230.out
>>>
>>> diff --git a/tests/generic/1230 b/tests/generic/1230
>>> new file mode 100755
>>> index 00000000..cff5adc0
>>> --- /dev/null
>>> +++ b/tests/generic/1230
>>> @@ -0,0 +1,397 @@
>>> +#! /bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0
>>> +# Copyright (c) 2025 IBM Corporation. All Rights Reserved.
>>> +#
>>> +# FS QA Test No. 1230
>>> +#
>>> +# Test multi block atomic writes with sudden FS shutdowns to ensure
>>> +# the FS is not tearing the write operation
>>> +. ./common/preamble
>>> +. ./common/atomicwrites
>>> +_begin_fstest auto atomicwrites
>>> +
>>> +_require_scratch_write_atomic_multi_fsblock
>>> +_require_atomic_write_test_commands
>>> +_require_scratch_shutdown
>>> +_require_xfs_io_command "truncate"
>>
>> is a similar fallocate test needed?
>
> Hey John, we run the test for the following cases:
>
> - file has mixed mapping
> - file has only hole (trucate case)
> - file has only unwritten blocks (falloc case)
> - file has only written blocks
>
> So we already do that. It's just that we don't need the
> _require_xfs_io_command "falloc" since it is already included in
> _require_atomic_write_test_commnads.
ok
>
>>
>>> +
>>> +_scratch_mkfs >> $seqres.full 2>&1
>>> +_scratch_mount >> $seqres.full
>>> +
>>> +testfile=$SCRATCH_MNT/testfile
>>> +touch $testfile
>>> +
>>> +awu_max=$(_get_atomic_write_unit_max $testfile)
>>> +blksz=$(_get_block_size $SCRATCH_MNT)
>>> +echo "Awu max: $awu_max" >> $seqres.full
>>> +
>>> +num_blocks=$((awu_max / blksz))
>>> +# keep initial value high for dry run. This will be
>>> +# tweaked in dry_run() based on device write speed.
>>> +filesize=$(( 10 * 1024 * 1024 * 1024 ))
>>
>> could this cause some out-of-space issue? That's 10GB, right?
>
> Hey John, yes this is just a dummy value. We tune the filesize later
> based on how fast the device is. That will usually be around 3 * (bytes
> written in 0.2s) (check dry_run function).
>
> Generally this will be a smaller size. ( 3GB on a 5GB/s SSD.) But yes
> I should probably add a _notrun if our ssd fast enough to fill up the
> full FS in 0.2s.
ok, good
>
>>
>>> +
>>> +_cleanup() {
>>> + [ -n "$awloop_pid" ] && kill $awloop_pid &> /dev/null
>>> + wait
>>> +}
>>> +
>>> +atomic_write_loop() {
>>> + local off=0
>>> + local size=$awu_max
>>> + for ((i=0; i<$((filesize / $size )); i++)); do
>>> + # Due to sudden shutdown this can produce errors so just
>>> + # redirect them to seqres.full
>>> + $XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
>>> + echo "Written to offset: $off" >> $tmp.aw
>>> + off=$((off + $size))
>>> + done
>>> +}
>>> +
>>> +# This test has the following flow:
>>> +# 1. Start doing sequential atomic writes in bg, upto $filesize
>>
>> bg?
>
> background*. I'll change it.
>>
>>> +# 2. Sleep for 0.2s and shutdown the FS
>>> +# 3. kill the atomic write process
>>> +# 4. verify the writes were not torn
>>> +#
>>> +# We ideally want the shutdown to happen while an atomic write is ongoing
>>> +# but this gets tricky since faster devices can actually finish the whole
>>> +# atomic write loop before sleep 0.2s completes, resulting in the shutdown
>>> +# happening after the write loop which is not what we want. A simple solution
>>> +# to this is to increase $filesize so step 1 takes long enough but a big
>>> +# $filesize leads to create_mixed_mappings() taking very long, which is not
>>> +# ideal.
>>> +#
>>> +# Hence, use the dry_run function to figure out the rough device speed and set
>>> +# $filesize accordingly.
>>> +dry_run() {
>>> + echo >> $seqres.full
>>> + echo "# Estimating ideal filesize..." >> $seqres.full
>>> + atomic_write_loop &
>>> + awloop_pid=$!
>>> +
>>> + local i=0
>>> + # Wait for atleast first write to be recorded or 10s
>>> + while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
>>> +
>>> + if [[ $i -gt 50 ]]
>>> + then
>>> + _fail "atomic write process took too long to start"
>>> + fi
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Shutting down filesystem while write is running" >> $seqres.full
>>> + _scratch_shutdown
>>> +
>>> + kill $awloop_pid 2>/dev/null # the process might have finished already
>>> + wait $awloop_pid
>>> + unset $awloop_pid
>>> +
>>> + bytes_written=$(tail -n 1 $tmp.aw | cut -d" " -f4)
>>> + echo "# Bytes written in 0.2s: $bytes_written" >> $seqres.full
>>> +
>>> + filesize=$((bytes_written * 3))
>>> + echo "# Setting \$filesize=$filesize" >> $seqres.full
>>> +
>>> + rm $tmp.aw
>>> + sleep 0.5
>>> +
>>> + _scratch_cycle_mount
>>> +
>>> +}
>>> +
>>> +create_mixed_mappings() {
>>
>> Is this same as patch 08/12?
>
> I believe you mean the [D]SYNC tests, yes it is the same.
then maybe factor out the test, if possible. I assume that this sort of
approach is taken for xfstests.
>
>>
>>> + local file=$1
>>> + local size_bytes=$2
>>> +
>>> + echo "# Filling file $file with alternate mappings till size $size_bytes" >> $seqres.full
>>> + #Fill the file with alternate written and unwritten blocks
>>> + local off=0
>>> + local operations=("W" "U")
>>> +
>>> + for ((i=0; i<$((size_bytes / blksz )); i++)); do
>>> + index=$(($i % ${#operations[@]}))
>>> + map="${operations[$index]}"
>>> +
>>> + case "$map" in
>>> + "W")
>>> + $XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file >> /dev/null
>>
>> does this just write random data? I don't see any pattern being set.
>
> By default pwrite writes 0xcd if no patterns is specified. This helps us
> reliably check the data back.
ok, understood
>
>>
>>> + ;;
>>> + "U")
>>> + $XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
>>> + ;;
>>> + esac
>>> + off=$((off + blksz))
>>> + done
>>> +
>>> + sync $file
>>> +}
>>> +
>>> +populate_expected_data() {
>>> + # create a dummy file with expected old data for different cases
>>> + create_mixed_mappings $testfile.exp_old_mixed $awu_max
>>> + expected_data_old_mixed=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp_old_mixed)
>>> +
>>> + $XFS_IO_PROG -fc "falloc 0 $awu_max" $testfile.exp_old_zeroes >> $seqres.full
>>> + expected_data_old_zeroes=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp_old_zeroes)
>>> +
>>> + $XFS_IO_PROG -fc "pwrite -b $awu_max 0 $awu_max" $testfile.exp_old_mapped >> $seqres.full
>>> + expected_data_old_mapped=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp_old_mapped)
>>> +
>>> + # create a dummy file with expected new data
>>> + $XFS_IO_PROG -fc "pwrite -S 0x61 -b $awu_max 0 $awu_max" $testfile.exp_new >> $seqres.full
>>> + expected_data_new=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp_new)
>>> +}
>>> +
>>> +verify_data_blocks() {
>>> + local verify_start=$1
>>> + local verify_end=$2
>>> + local expected_data_old="$3"
>>> + local expected_data_new="$4"
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Checking data integrity from $verify_start to $verify_end" >> $seqres.full
>>> +
>>> + # After an atomic write, for every chunk we ensure that the underlying
>>> + # data is either the old data or new data as writes shouldn't get torn.
>>> + local off=$verify_start
>>> + while [[ "$off" -lt "$verify_end" ]]
>>> + do
>>> + #actual_data=$(xxd -s $off -l $awu_max -p $testfile)
>>> + actual_data=$(od -An -t x1 -j $off -N $awu_max $testfile)
>>> + if [[ "$actual_data" != "$expected_data_new" ]] && [[ "$actual_data" != "$expected_data_old" ]]
>>> + then
>>> + echo "Checksum match failed at off: $off size: $awu_max"
>>> + echo "Expected contents: (Either of the 2 below):"
>>> + echo
>>> + echo "Expected old: "
>>> + echo "$expected_data_old"
>>
>>
>> it would be nice if this was deterministic - see comment in
>> create_mixed_mappings
>
> Yes, it is. It will be 0xcdcdcdcd
>>
>>> + echo
>>> + echo "Expected new: "
>>> + echo "$expected_data_new"
>>
>> nit: I am not sure what is meant by "expected". I would just have "new
>> data". We don't know what to expect, as it could be old or new, right?
>
> Yes, so the I was thinking of it this way:
>
> We either expect the data to be the full new (named expected_new) or
> fully old (named expected_old). Else renaming it to new vs old vs actual
> makese it a bit more confusing imo
ok, good
>
>>
>>> + echo
>>> + echo "Actual contents: "
>>> + echo "$actual_data"
>>> +
>>> + _fail
>>> + fi
>>> + echo -n "Check at offset $off suceeded! " >> $seqres.full
>>> + if [[ "$actual_data" == "$expected_data_new" ]]
>>> + then
>>> + echo "matched new" >> $seqres.full
>>> + elif [[ "$actual_data" == "$expected_data_old" ]]
>>> + then
>>> + echo "matched old" >> $seqres.full
>>> + fi
>>> + off=$(( off + awu_max ))
>>> + done
>>> +}
>>> +
>>> +# test data integrity for file by shutting down in between atomic writes
>>> +test_data_integrity() {
>>> + echo >> $seqres.full
>>> + echo "# Writing atomically to file in background" >> $seqres.full
>>> + atomic_write_loop &
>>> + awloop_pid=$!
>>> +
>>
>> from here ...
>>
>>> + local i=0
>>> + # Wait for atleast first write to be recorded or 10s
>>> + while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
>>> +
>>> + if [[ $i -gt 50 ]]
>>> + then
>>> + _fail "atomic write process took too long to start"
>>> + fi
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Shutting down filesystem while write is running" >> $seqres.full
>>> + _scratch_shutdown
>>> +
>>> + kill $awloop_pid 2>/dev/null # the process might have finished already
>>> + wait $awloop_pid
>>> + unset $awloop_pid
>>
>> ... to here looks similar in many functions. Can we factor it out?
>
> Right thats true, I'll factor this out. Thanks for pointing it out.
ok
>>
>>> +
>>> + last_offset=$(tail -n 1 $tmp.aw | cut -d" " -f4)
>>> + if [[ -z $last_offset ]]
>>> + then
>>> + last_offset=0
>>> + fi
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Last offset of atomic write: $last_offset" >> $seqres.full
>>> +
>>> + rm $tmp.aw
>>> + sleep 0.5
>>> +
>>> + _scratch_cycle_mount
>>> +
>>> + # we want to verify all blocks around which the shutdown happended
>>> + verify_start=$(( last_offset - (awu_max * 5)))
>>> + if [[ $verify_start < 0 ]]
>>> + then
>>> + verify_start=0
>>> + fi
>>> +
>>> + verify_end=$(( last_offset + (awu_max * 5)))
>>> + if [[ "$verify_end" -gt "$filesize" ]]
>>> + then
>>> + verify_end=$filesize
>>> + fi
>>> +}
>>> +
>>> +# test data integrity for file wiht written and unwritten mappings
>>
>> with
>>
>>> +test_data_integrity_mixed() {
>>> + $XFS_IO_PROG -fc "truncate 0" $testfile >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Creating testfile with mixed mappings" >> $seqres.full
>>> + create_mixed_mappings $testfile $filesize
>>> +
>>> + test_data_integrity
>>> +
>>> + verify_data_blocks $verify_start $verify_end "$expected_data_old_mixed" "$expected_data_new"
>>> +}
>>> +
>>> +# test data integrity for file with completely written mappings
>>> +test_data_integrity_writ() {
>>
>> please spell "writ" out fully, which I think should be "written"
>
> Yes, will do.
>
>>
>>> + $XFS_IO_PROG -fc "truncate 0" $testfile >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Creating testfile with fully written mapping" >> $seqres.full
>>> + $XFS_IO_PROG -c "pwrite -b $filesize 0 $filesize" $testfile >> $seqres.full
>>> + sync $testfile
>>> +
>>> + test_data_integrity
>>> +
>>> + verify_data_blocks $verify_start $verify_end "$expected_data_old_mapped" "$expected_data_new"
>>> +}
>>> +
>>> +# test data integrity for file with completely unwritten mappings
>>> +test_data_integrity_unwrit() {
>>
>> same as above
>>
>>> + $XFS_IO_PROG -fc "truncate 0" $testfile >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Creating testfile with fully unwritten mappings" >> $seqres.full
>>> + $XFS_IO_PROG -c "falloc 0 $filesize" $testfile >> $seqres.full
>>> + sync $testfile
>>> +
>>> + test_data_integrity
>>> +
>>> + verify_data_blocks $verify_start $verify_end "$expected_data_old_zeroes" "$expected_data_new"
>>> +}
>>> +
>>> +# test data integrity for file with no mappings
>>> +test_data_integrity_hole() {
>>> + $XFS_IO_PROG -fc "truncate 0" $testfile >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Creating testfile with no mappings" >> $seqres.full
>>> + $XFS_IO_PROG -c "truncate $filesize" $testfile >> $seqres.full
>>> + sync $testfile
>>> +
>>> + test_data_integrity
>>> +
>>> + verify_data_blocks $verify_start $verify_end "$expected_data_old_zeroes" "$expected_data_new"
>>> +}
>>> +
>>> +test_filesize_integrity() {
>>> + $XFS_IO_PROG -c "truncate 0" $testfile >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Performing extending atomic writes over file in background" >> $seqres.full
>>> + atomic_write_loop &
>>> + awloop_pid=$!
>>> +
>>> + local i=0
>>> + # Wait for atleast first write to be recorded or 10s
>>> + while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
>>> +
>>> + if [[ $i -gt 50 ]]
>>> + then
>>> + _fail "atomic write process took too long to start"
>>> + fi
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Shutting down filesystem while write is running" >> $seqres.full
>>> + _scratch_shutdown
>>> +
>>> + kill $awloop_pid 2>/dev/null # the process might have finished already
>>> + wait $awloop_pid
>>> + unset $awloop_pid
>>> +
>>> + local last_offset=$(tail -n 1 $tmp.aw | cut -d" " -f4)
>>> + if [[ -z $last_offset ]]
>>> + then
>>> + last_offset=0
>>> + fi
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Last offset of atomic write: $last_offset" >> $seqres.full
>>> + rm $tmp.aw
>>> + sleep 0.5
>>> +
>>> + _scratch_cycle_mount
>>> + local filesize=$(_get_filesize $testfile)
>>> + echo >> $seqres.full
>>> + echo "# Filesize after shutdown: $filesize" >> $seqres.full
>>> +
>>> + # To confirm that the write went atomically, we check:
>>> + # 1. The last block should be a multiple of awu_max
>>> + # 2. The last block should be the completely new data
>>> +
>>> + if (( $filesize % $awu_max ))
>>> + then
>>> + echo "Filesize after shutdown ($filesize) not a multiple of atomic write unit ($awu_max)"
>>> + fi
>>> +
>>> + verify_start=$(( filesize - (awu_max * 5)))
>>> + if [[ $verify_start < 0 ]]
>>> + then
>>> + verify_start=0
>>> + fi
>>> +
>>> + local verify_end=$filesize
>>> +
>>> + # Here the blocks should always match new data hence, for simplicity of
>>> + # code, just corrupt the $expected_data_old buffer so it never matches
>>> + local expected_data_old="POISON"
>>> + verify_data_blocks $verify_start $verify_end "$expected_data_old" "$expected_data_new"
>>> +}
>>> +
>>> +$XFS_IO_PROG -fc "truncate 0" $testfile >> $seqres.full
>>> +
>>> +dry_run
>>> +
>>> +echo >> $seqres.full
>>> +echo "# Populating expected data buffers" >> $seqres.full
>>> +populate_expected_data
>>> +
>>> +# Loop 20 times to shake out any races due to shutdown
>>> +for ((iter=0; iter<20; iter++))
>>> +do
>>> + echo >> $seqres.full
>>> + echo "------ Iteration $iter ------" >> $seqres.full
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Starting data integrity test for atomic writes over mixed mapping" >> $seqres.full
>>> + test_data_integrity_mixed
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Starting data integrity test for atomic writes over fully written mapping" >> $seqres.full
>>> + test_data_integrity_writ
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Starting data integrity test for atomic writes over fully unwritten mapping" >> $seqres.full
>>> + test_data_integrity_unwrit
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Starting data integrity test for atomic writes over holes" >> $seqres.full
>>> + test_data_integrity_hole
>>> +
>>> + echo >> $seqres.full
>>> + echo "# Starting filesize integrity test for atomic writes" >> $seqres.full
>>
>> what does "Starting filesize integrity test" mean?
>
> Basically other tests already truncate the file to a higher value and
> then perform the shut down test. Here we actually do append atomic
> writes since we want to also stress the i_size update paths during
> shutdown to ensure that doesn't cause any tearing with atomic writes.
>
> I can maybe rename it to:
>
>
> echo "# Starting data integrity test for atomic append writes" >> $seqres.full
>
> Thanks for the review!
>
It's just the name "integrity" that throws me a bit..
Powered by blists - more mailing lists