lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220423095442.GA33425@xsang-OptiPlex-9020>
Date:   Sat, 23 Apr 2022 17:54:42 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Christoph Hellwig <hch@....de>
Cc:     lkp@...ts.01.org, lkp@...el.com,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [dm]  92986f6b4c: xfstests.generic.455.fail

Hi, Christoph Hellwig,

sorry for some information sent to you maybe caused some confusion. I will
serve as single point of contact if any further information/test are required.

On Wed, Mar 23, 2022 at 08:54:58AM +0100, Christoph Hellwig wrote:
> I can't reproduce this.

could I ask how you try to reproduce? did you try to use below "To reproduce:"
in our original report?

quite sorry that we tried that reproducer by ourselves, and found it blocks at
one point. we will fix that and eat our dog food firstly...

the next question is by which commit you tried to reproduce?
based on
(1) this commit already merged into mainline at v5.18-rc1
(2) we noticed there is fix commit [1] which merged into mainline at v5.18-rc3
(3) low rate on either v5.18-rc1 or v5.18-rc3, then we doubt if the parent is
    really clean

we ran more tests, the results are as below:

92986f6b4c8a2c24 56b4b5abcdab6daf71c5536fca2                   v5.18-rc1                   v5.18-rc3
---------------- --------------------------- --------------------------- ---------------------------
       fail:runs  %reproduction    fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
           |             |             |             |             |             |             |
         23:30         -73%           0:60         -72%           3:20         -72%           1:30    xfstests.generic.455.fail

so
(1) the parent is still clean by 60 runs
(2) easy to reproduce on this 92986f6b4c (23 out of 30 runs)
(3) but really hard to reproduce on v5.18-rc1, seems even harder on v5.18-rc3
    though it's still reproducible on these two rc commit

We noticed you mentioned in another mail:
"But if it is only partially reproducible it probably is some kind of race
slightly affected by different timings."

not sure if we could call done here. but if you want to look at this more,
below is the extra information we could supply.


we confirmed the config used to build 92986f6b4c and its parent (56b4b5abcd)
are identical, as attached.

but the config for v5.18-rc1 or v5.18-rc3 are quite different.
not sure if any diff within them is the potential reason to the big difference
of the reproduce rate, so I also attached the config used to build v5.18-rc3
for information.

while looking into detail failures, it seems to us the case failed on
different part, such like below 3 failures for commit 92986f6b4c
Run #1:
testfile0.mark4 md5sum mismatched
Run #2:
testfile1.mark10 md5sum mismatched
Run #3:
testfile0.mark1 md5sum mismatched

and for that single one failure from v5.18-rc3, it's:
testfile1.mark16 md5sum mismatched

I attached the dmesg, full log and bad log from the failed run upon v5.18-rc3
in case they can supply any hint.

any further question/request, please let us know. Thanks a lot!

[1]
commit 92b914e29af3e99589f2d2876616c0b534892ed4 (device-mapper-dm/dm-5.18)
Author: Shin'ichiro Kawasaki <shinichiro.kawasaki@....com>
Date:   Fri Apr 15 17:45:13 2022 +0900

    dm: fix bio length of empty flush

    The commit 92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
    removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
    available. In this case, ci->bio is not copied to ci->io->tio.clone.
    This is fine since init_clone_info() sets same values to ci->bio and
    ci->io->tio.clone.

    However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
    prepares a zero length bio on stack and set it to ci->bio. At this time,
    ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
    this ci->io->tio.clone as the bio to map, it is passed to targets as
    non-empty flush bio. It causes bio length check failure in dm-zoned and
    unexpected operation such as dm_accept_partial_bio() call.

    To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
    in __send_empty_flush().

    Fixes: 92986f6b4c8a ("dm: use bio_clone_fast in alloc_io/alloc_tio")
    Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@....com>
    Signed-off-by: Mike Snitzer <snitzer@...nel.org>

> 
> On Thu, Mar 17, 2022 at 04:39:29PM +0800, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed the following commit (built with gcc-9):
> > 
> > commit: 92986f6b4c8a2c24d3a36b80140624f80fd93de4 ("dm: use bio_clone_fast in alloc_io/alloc_tio")
> > https://github.com/ammarfaizi2/linux-block axboe/linux-block/for-5.18/block
> > 
> > in testcase: xfstests
> > version: xfstests-x86_64-1de1db8-1_20220217
> > with following parameters:
> > 
> > 	disk: 4HDD
> > 	fs: xfs
> > 	test: generic-logwrites
> > 	ucode: 0xec
> > 
> > test-description: xfstests is a regression test suite for xfs and other files ystems.
> > test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> > 
> > 
> > on test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz with 16G memory
> > 
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > 
> > 
> > 
> > 
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > 
> > 2022-03-13 14:24:40 export TEST_DIR=/fs/sda1
> > 2022-03-13 14:24:40 export TEST_DEV=/dev/sda1
> > 2022-03-13 14:24:40 export FSTYP=xfs
> > 2022-03-13 14:24:40 export SCRATCH_MNT=/fs/scratch
> > 2022-03-13 14:24:40 mkdir /fs/scratch -p
> > 2022-03-13 14:24:40 export SCRATCH_DEV=/dev/sda4
> > 2022-03-13 14:24:40 export SCRATCH_LOGDEV=/dev/sda2
> > 2022-03-13 14:24:40 export LOGWRITES_DEV=/dev/sda2
> > 2022-03-13 14:24:40 export MKFS_OPTIONS=-mreflink=1
> > 2022-03-13 14:24:40 sed "s:^:generic/:" //lkp/benchmarks/xfstests/tests/generic-logwrites
> > 2022-03-13 14:24:40 ./check generic/482 generic/457 generic/455
> > FSTYP         -- xfs (debug)
> > PLATFORM      -- Linux/x86_64 lkp-skl-d06 5.17.0-rc2-00044-g92986f6b4c8a #1 SMP Sun Mar 13 14:11:02 CST 2022
> > MKFS_OPTIONS  -- -f -mreflink=1 /dev/sda4
> > MOUNT_OPTIONS -- /dev/sda4 /fs/scratch
> > 
> > generic/455	[failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//generic/455.out.bad)
> >     --- tests/generic/455.out	2022-02-17 11:55:00.000000000 +0000
> >     +++ /lkp/benchmarks/xfstests/results//generic/455.out.bad	2022-03-13 14:26:00.664268705 +0000
> >     @@ -1,2 +1,3 @@
> >      QA output created by 455
> >     -Silence is golden
> >     +testfile1.mark13 md5sum mismatched
> >     +(see /lkp/benchmarks/xfstests/results//generic/455.full for details)
> >     ...
> >     (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/455.out /lkp/benchmarks/xfstests/results//generic/455.out.bad'  to see the entire diff)
> > generic/457	 20s
> > generic/482	 427s
> > Ran: generic/455 generic/457 generic/482
> > Failures: generic/455
> > Failed 1 of 3 tests
> > 
> > 
> > 
> > 
> > To reproduce:
> > 
> >         git clone https://github.com/intel/lkp-tests.git
> >         cd lkp-tests
> >         sudo bin/lkp install job.yaml           # job file is attached in this email
> >         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >         sudo bin/lkp run generated-yaml-file
> > 
> >         # if come across any failure that blocks the test,
> >         # please remove ~/.lkp and /lkp dir to run from a clean state.
> > 
> > 
> > 
> > ---
> > 0-DAY CI Kernel Test Service
> > https://lists.01.org/hyperkitty/list/lkp@lists.01.org
> > 
> > Thanks,
> > Oliver Sang
> > 
> 

View attachment "config-5.17.0-rc2-00044-g92986f6b4c8a" of type "text/plain" (165702 bytes)

View attachment "config-5.18.0-rc3" of type "text/plain" (165894 bytes)

Download attachment "dmesg-v5.18-rc3.xz" of type "application/x-xz" (31400 bytes)

View attachment "455-for-v5.18-rc3.full" of type "text/plain" (43385 bytes)

View attachment "455-for-v5.18-rc3.out.bad" of type "text/plain" (130 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ