linux-kernel - Re: [LKP] Re: [ext4] 21175ca434: mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7f4f7ecd-13e3-b23e-6a0b-04122b98e6f2@intel.com>
Date:   Thu, 29 Apr 2021 15:43:39 +0800
From:   Rong Chen <rong.a.chen@...el.com>
To:     Theodore Ts'o <tytso@....edu>,
        kernel test robot <oliver.sang@...el.com>
Cc:     Harshad Shirwadkar <harshadshirwadkar@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com, dm-devel@...hat.com
Subject: Re: [LKP] Re: [ext4] 21175ca434:
 mdadm-selftests.enchmarks/mdadm-selftests/tests/01r1fail.fail



On 4/28/21 10:03 PM, Theodore Ts'o wrote:
> (Hmm, why did you cc linux-km on this report?  I would have thought
> dm-devel would have made more sense?)
>
> On Tue, Apr 27, 2021 at 04:15:39PM +0800, kernel test robot wrote:
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: 21175ca434c5d49509b73cf473618b01b0b85437 ("ext4: make prefetch_block_bitmaps default")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> in testcase: mdadm-selftests
>> version: mdadm-selftests-x86_64-5d518de-1_20201008
>> with following parameters:
>>
>> 	disk: 1HDD
>> 	test_prefix: 01r1
>> 	ucode: 0x21
> So this failure makes no sense to me.  Looking at the kmesg failure
> logs, it's failing in the md layer:
>
> kern  :info  : [   99.775514] md/raid1:md0: not clean -- starting background reconstruction
> kern  :info  : [   99.783372] md/raid1:md0: active with 3 out of 4 mirrors
> kern  :info  : [   99.789735] md0: detected capacity change from 0 to 37888
> kern  :info  : [   99.796216] md: resync of RAID array md0
> kern  :crit  : [   99.900450] md/raid1:md0: Disk failure on loop2, disabling device.
>                                md/raid1:md0: Operation continuing on 2 devices.
> kern  :crit  : [   99.918281] md/raid1:md0: Disk failure on loop1, disabling device.
>                                md/raid1:md0: Operation continuing on 1 devices.
> kern  :info  : [  100.835833] md: md0: resync interrupted.
> kern  :info  : [  101.852898] md: resync of RAID array md0
> kern  :info  : [  101.858347] md: md0: resync done.
> user  :notice: [  102.109684] /lkp/benchmarks/mdadm-selftests/tests/01r1fail... FAILED - see /var/tmp/01r1fail.log and /var/tmp/fail01r1fail.log for details
>
> The referenced commit just turns block bitmap prefetching in ext4.
> This should not cause md to failure; if so, that's an md bug, not an
> ext4 bug.  There should not be anything that the file system is doing
> that would cause the kernel to think there is a disk failure.
>
> By the way, the reproduction instructions aren't working currently:
>
>> To reproduce:
>>
>>          git clone https://github.com/intel/lkp-tests.git
>>          cd lkp-tests
>>          bin/lkp install                job.yaml  # job file is attached in this email
> This fails because lkp is trying to apply a patch which does not apply
> with the current version of the md tools.

Hi Ted,

Thanks for the feedback, yes, there's patch already be merged into mdadm,
we have removed it from our code.

>
>>          bin/lkp split-job --compatible job.yaml
>>          bin/lkp run                    compatible-job.yaml
> And the current versions lkp don't generate a compatible-job.yaml file
> when you run "lkp split-job --compatable"; instead it generates a new
> yaml file with a set of random characters to generate a unique name.
> (What Multics parlance would be called a "shriek name"[1] :-)

We have updated the steps to avoid misunderstanding.

>
> Since I was having trouble running the reproduction; could you send
> the /var/tmp/*fail.logs so we could have a bit more insight what is
> going on?

I attached the log file for your reference,
btw the test is from 
https://github.com/neilbrown/mdadm/blob/master/tests/01r1fail,
you may want to run it directly.

Best Regards,
Rong Chen

>
> Thanks!
>
> 					- Ted
> _______________________________________________
> LKP mailing list -- lkp@...ts.01.org
> To unsubscribe send an email to lkp-leave@...ts.01.org


View attachment "log" of type "text/plain" (3271 bytes)