[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220423095442.GA33425@xsang-OptiPlex-9020>
Date: Sat, 23 Apr 2022 17:54:42 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Christoph Hellwig <hch@....de>
Cc: lkp@...ts.01.org, lkp@...el.com,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [dm] 92986f6b4c: xfstests.generic.455.fail
Hi, Christoph Hellwig,
sorry for some information sent to you maybe caused some confusion. I will
serve as single point of contact if any further information/test are required.
On Wed, Mar 23, 2022 at 08:54:58AM +0100, Christoph Hellwig wrote:
> I can't reproduce this.
could I ask how you try to reproduce? did you try to use below "To reproduce:"
in our original report?
quite sorry that we tried that reproducer by ourselves, and found it blocks at
one point. we will fix that and eat our dog food firstly...
the next question is by which commit you tried to reproduce?
based on
(1) this commit already merged into mainline at v5.18-rc1
(2) we noticed there is fix commit [1] which merged into mainline at v5.18-rc3
(3) low rate on either v5.18-rc1 or v5.18-rc3, then we doubt if the parent is
really clean
we ran more tests, the results are as below:
92986f6b4c8a2c24 56b4b5abcdab6daf71c5536fca2 v5.18-rc1 v5.18-rc3
---------------- --------------------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | | | |
23:30 -73% 0:60 -72% 3:20 -72% 1:30 xfstests.generic.455.fail
so
(1) the parent is still clean by 60 runs
(2) easy to reproduce on this 92986f6b4c (23 out of 30 runs)
(3) but really hard to reproduce on v5.18-rc1, seems even harder on v5.18-rc3
though it's still reproducible on these two rc commit
We noticed you mentioned in another mail:
"But if it is only partially reproducible it probably is some kind of race
slightly affected by different timings."
not sure if we could call done here. but if you want to look at this more,
below is the extra information we could supply.
we confirmed the config used to build 92986f6b4c and its parent (56b4b5abcd)
are identical, as attached.
but the config for v5.18-rc1 or v5.18-rc3 are quite different.
not sure if any diff within them is the potential reason to the big difference
of the reproduce rate, so I also attached the config used to build v5.18-rc3
for information.
while looking into detail failures, it seems to us the case failed on
different part, such like below 3 failures for commit 92986f6b4c
Run #1:
testfile0.mark4 md5sum mismatched
Run #2:
testfile1.mark10 md5sum mismatched
Run #3:
testfile0.mark1 md5sum mismatched
and for that single one failure from v5.18-rc3, it's:
testfile1.mark16 md5sum mismatched
I attached the dmesg, full log and bad log from the failed run upon v5.18-rc3
in case they can supply any hint.
any further question/request, please let us know. Thanks a lot!
[1]
commit 92b914e29af3e99589f2d2876616c0b534892ed4 (device-mapper-dm/dm-5.18)
Author: Shin'ichiro Kawasaki <shinichiro.kawasaki@....com>
Date: Fri Apr 15 17:45:13 2022 +0900
dm: fix bio length of empty flush
The commit 92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
available. In this case, ci->bio is not copied to ci->io->tio.clone.
This is fine since init_clone_info() sets same values to ci->bio and
ci->io->tio.clone.
However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
prepares a zero length bio on stack and set it to ci->bio. At this time,
ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
this ci->io->tio.clone as the bio to map, it is passed to targets as
non-empty flush bio. It causes bio length check failure in dm-zoned and
unexpected operation such as dm_accept_partial_bio() call.
To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
in __send_empty_flush().
Fixes: 92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@....com>
Signed-off-by: Mike Snitzer <snitzer@...nel.org>
>
> On Thu, Mar 17, 2022 at 04:39:29PM +0800, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 92986f6b4c8a2c24d3a36b80140624f80fd93de4 ("dm: use bio_clone_fast in alloc_io/alloc_tio")
> > https://github.com/ammarfaizi2/linux-block axboe/linux-block/for-5.18/block
> >
> > in testcase: xfstests
> > version: xfstests-x86_64-1de1db8-1_20220217
> > with following parameters:
> >
> > disk: 4HDD
> > fs: xfs
> > test: generic-logwrites
> > ucode: 0xec
> >
> > test-description: xfstests is a regression test suite for xfs and other files ystems.
> > test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> >
> >
> > on test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz with 16G memory
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> >
> > 2022-03-13 14:24:40 export TEST_DIR=/fs/sda1
> > 2022-03-13 14:24:40 export TEST_DEV=/dev/sda1
> > 2022-03-13 14:24:40 export FSTYP=xfs
> > 2022-03-13 14:24:40 export SCRATCH_MNT=/fs/scratch
> > 2022-03-13 14:24:40 mkdir /fs/scratch -p
> > 2022-03-13 14:24:40 export SCRATCH_DEV=/dev/sda4
> > 2022-03-13 14:24:40 export SCRATCH_LOGDEV=/dev/sda2
> > 2022-03-13 14:24:40 export LOGWRITES_DEV=/dev/sda2
> > 2022-03-13 14:24:40 export MKFS_OPTIONS=-mreflink=1
> > 2022-03-13 14:24:40 sed "s:^:generic/:" //lkp/benchmarks/xfstests/tests/generic-logwrites
> > 2022-03-13 14:24:40 ./check generic/482 generic/457 generic/455
> > FSTYP -- xfs (debug)
> > PLATFORM -- Linux/x86_64 lkp-skl-d06 5.17.0-rc2-00044-g92986f6b4c8a #1 SMP Sun Mar 13 14:11:02 CST 2022
> > MKFS_OPTIONS -- -f -mreflink=1 /dev/sda4
> > MOUNT_OPTIONS -- /dev/sda4 /fs/scratch
> >
> > generic/455 [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//generic/455.out.bad)
> > --- tests/generic/455.out 2022-02-17 11:55:00.000000000 +0000
> > +++ /lkp/benchmarks/xfstests/results//generic/455.out.bad 2022-03-13 14:26:00.664268705 +0000
> > @@ -1,2 +1,3 @@
> > QA output created by 455
> > -Silence is golden
> > +testfile1.mark13 md5sum mismatched
> > +(see /lkp/benchmarks/xfstests/results//generic/455.full for details)
> > ...
> > (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/455.out /lkp/benchmarks/xfstests/results//generic/455.out.bad' to see the entire diff)
> > generic/457 20s
> > generic/482 427s
> > Ran: generic/455 generic/457 generic/482
> > Failures: generic/455
> > Failed 1 of 3 tests
> >
> >
> >
> >
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > sudo bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> > sudo bin/lkp run generated-yaml-file
> >
> > # if come across any failure that blocks the test,
> > # please remove ~/.lkp and /lkp dir to run from a clean state.
> >
> >
> >
> > ---
> > 0-DAY CI Kernel Test Service
> > https://lists.01.org/hyperkitty/list/lkp@lists.01.org
> >
> > Thanks,
> > Oliver Sang
> >
>
View attachment "config-5.17.0-rc2-00044-g92986f6b4c8a" of type "text/plain" (165702 bytes)
View attachment "config-5.18.0-rc3" of type "text/plain" (165894 bytes)
Download attachment "dmesg-v5.18-rc3.xz" of type "application/x-xz" (31400 bytes)
View attachment "455-for-v5.18-rc3.full" of type "text/plain" (43385 bytes)
View attachment "455-for-v5.18-rc3.out.bad" of type "text/plain" (130 bytes)
Powered by blists - more mailing lists