lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <128ad8a8-3aaf-f5f1-3709-373ad504ca44@gmail.com>
Date:   Tue, 17 Jan 2017 02:54:06 +0100
From:   MasterPrenium <masterprenium.lkml@...il.com>
To:     Shaohua Li <shli@...nel.org>
Cc:     linux-kernel@...r.kernel.org, xen-users@...ts.xen.org,
        linux-raid@...r.kernel.org,
        "MasterPrenium@...il.com" <MasterPrenium@...il.com>,
        xen-devel@...ts.xenproject.org
Subject: Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

Hi Shaohua,

I've made some new little tests, maybe it can help.

- I tried creating the RAID 5 stack with only 2 drives (mdadm --create 
/dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 missing).
The same issue is happening.
- but one time (still with 2/3 drives), I was not able to crash the 
kernel, with exactly the same procedure as previous. Even with 
re-creating filesystems ect.
In order to re-produce the BUG I had to re-create the array.

Can this be linked to this message ? :
[  155.667456] md10: Warning: Device sdc1 is misaligned

I don't know how to "align" a drive in a RAID stack... The partition is 
correctly align (as "parted" says).

- In another test (still 2/3 drives in the stack), I didn't got the 
kernel crash, but I had 100% io wait on cpu. Trying to reboot, finally 
give me this printk messages : http://pastebin.com/uzVHUUrC

If you have any patch to give me (maybe something to be more verbose 
about the issue), please tell me, I'll test it as it's a really blocking 
issue...

Best regards,

MasterPrenium


Le 09/01/2017 à 23:44, Shaohua Li a écrit :
> On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
>> Hello,
>>
>> Replies below + :
>> - I don't know if this can help but after the crash, when the system
>> reboots, the Raid 5 stack is re-synchronizing
>> [   37.028239] md10: Warning: Device sdc1 is misaligned
>> [   37.028541] created bitmap (15 pages) for device md10
>> [   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
>> 29807 bits
>>
>> - Sometimes the kernel completely crash (lost serial + network connection),
>> sometimes only got the "BUG" dump, but still have network access (but a
>> reboot is impossible, need to reset the system).
>>
>> - You can find blktrace here (while running fio), I hope it's complete since
>> the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
> Looks most are normal full stripe writes.
>   
>>> I'm trying to reproduce, but no success. So
>>> ext4->btrfs->raid5, crash
>>> btrfs->raid5, no crash
>>> right? does subvolume matter? When you create the raid5 array, does adding
>>> '--assume-clean' option change the behavior? I'd like to narrow down the issue.
>>> If you can capture the blktrace to the raid5 array, it would be great to hint
>>> us what kind of IO it is.
>> Yes Correct.
>> The subvolume doesn't matter.
>> -- assume-clean doesn't change the behaviour.
> so it's not a resync issue.
>
>> Don't forget that the system needs to be running on xen to crash, without
>> (on native kernel) it doesn't crash (or at least, I was not able to make it
>> crash).
>>>> Regarding your patch, I can't find it. Is it the one sent by Konstantin
>>>> Khlebnikov ?
>>> Right.
>> It doesn't help :(. Maybe the crash is happening a little bit later.
> ok, the patch is unlikely helpful, since the IO size isn't very big.
>
> Don't have good idea yet. My best guess so far is virtual machine introduces
> extra delay, which might trigger some race conditions which aren't seen in
> native.  I'll check if I could find something locally.
>
> Thanks,
> Shaohua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ