lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Apr 2021 12:29:11 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     kernel test robot <oliver.sang@...el.com>
CC:     Jens Axboe <axboe@...nel.dk>,
        Johannes Thumshirn <Johannes.Thumshirn@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        "lkp@...ts.01.org" <lkp@...ts.01.org>,
        "lkp@...el.com" <lkp@...el.com>
Subject: Re: [null_blk] de3510e52b: blktests.block.014.fail

On 2021/04/07 18:02, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: de3510e52b0a398261271455562458003b8eea62 ("null_blk: fix command timeout completion handling")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> 
> in testcase: blktests
> version: blktests-x86_64-a210761-1_20210124
> with following parameters:
> 
> 	disk: 1SSD
> 	test: nvme-group-00
> 	ucode: 0x11
> 
> 
> 
> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang@...el.com>
> 
> 
> block/014 (run null-blk with blk-mq and timeout injection configured)
> block/014 (run null-blk with blk-mq and timeout injection configured) [failed]
>     runtime  ...  71.624s
>     --- tests/block/014.out     2021-01-24 06:04:08.000000000 +0000
>     +++ /mnt/nvme-group-00/nodev/block/014.out.bad      2021-04-06 09:21:25.133971868 +0000
>     @@ -1,2 +1,377 @@
>      Running block/014
>     +dd: error reading '/dev/nullb0': Connection timed out
>     +dd: error reading '/dev/nullb0': Connection timed out
>     +dd: error reading '/dev/nullb0': Connection timed out
>     +dd: error reading '/dev/nullb0': Connection timed out
>     +dd: error reading '/dev/nullb0': Connection timed out
>     +dd: error reading '/dev/nullb0': Connection timed out
>     ...
>     (Run 'diff -u tests/block/014.out /mnt/nvme-group-00/nodev/block/014.out.bad' to see the entire diff)

This is not a kernel bug. It is a problem with blktest. Before my patch, the
timeout error was not propagated back to the user. It is now and causes dd to
fail. blktest seeing dd failing reports the test as failed. On the kernel side,
all is good, the reqs are completed as expected.

Note that the timeout error is reported back as is, using BLK_STS_TIMEOUT which
becomes ETIMEDOUT, hence the "Connection timed out" error message. May be we
should use the more traditional EIO ? Jens ?

In any case, I will send a patch to fix blktest block/014.


> 
> 
> 
> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install                job.yaml  # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml
>         bin/lkp run                    compatible-job.yaml
> 
> 
> 
> ---
> 0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
> https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation
> 
> Thanks,
> Oliver Sang
> 


-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ