linux-kernel - Re: block layer bug with 4.4-rc3+

Open Source and information security mailing list archives

Message-ID: <CACVXFVOcjjh3id8-vEvL4UjugiDP_pUCxY8w4g+1vkCKaf=_4Q@mail.gmail.com>
Date:	Fri, 18 Dec 2015 16:36:25 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Andre Przywara <andre.przywara@....com>
Cc:	Jens Axboe <axboe@...nel.dk>, Rob Herring <rob.herring@...aro.org>,
	Eric Auger <eric.auger@...aro.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-block@...r.kernel.org,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>
Subject: Re: block layer bug with 4.4-rc3+

On Thu, Dec 17, 2015 at 8:33 PM, Andre Przywara <andre.przywara@....com> wrote:
> Hi Ming,
>
> On 17/12/15 03:52, Ming Lei wrote:
>> On Wed, Dec 16, 2015 at 10:55 PM, Andre Przywara <andre.przywara@....com> wrote:
>>> Hi,
>>>
>>> On 15/12/15 13:39, Ming Lei wrote:
>>>> On Tue, Dec 15, 2015 at 8:23 PM, Andre Przywara <andre.przywara@....com> wrote:
>>>>> Hi Ming,
>>>>>
>>>>> thanks for the answer!
>>>>>
>>>>> On 15/12/15 11:54, Ming Lei wrote:
>>>>>> On Tue, Dec 15, 2015 at 7:05 PM, Andre Przywara <andre.przywara@....com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been experiencing issues with at least 4.4-rc3 (including current
>>>>>>
>>>>>> I'd suggest you to test the latest linus tree first, and at least two
>>>>>> fix patches
>>>>>> have been merged for blk-merge issue.  If there is still the issue
>>>>>> with linus tree,
>>>>>> I am happy to take a look.
>>>>>
>>>>> Mmh, as said ("including current HEAD") this happens still with the
>>>>> latest HEAD from Linus (which is "9f9499ae8e64: Linux 4.4-rc5" for me).
>>>>> Just tested yesterday.
>>>>> Is there another branch/tree with block fixes I should test? Is it worth
>>>>> to try any of the upcoming branches in linux-block.git (for-4.5/core,
>>>>> maybe?)
>>>>
>>>> Both the fixes have been in linus tree already, and reverting the commit
>>>> basically makes merge not possible, so there must be issues somewhere.
>>>>
>>>> And can you see the issue on other 32bit ARM platform?  I don't see the
>>>> issue on x86 and arm64, and the commit itself is correct, IMO.
>>>
>>> Quick tests on a Cubietruck didn't show the issue, but this board is
>>> nowhere near the Midway (2 in-order cores with 2GB RAM vs. 4
>>> out-of-order cores with 8 GB RAM), so the load isn't the same.
>>> I could rule out .config issues by using multi_v7_defconfig - with LPAE
>>> enabled on top, that is.
>>> Using the plain multi_v7_defconfig (which doesn't have LPAE and makes me
>>> loose half of the RAM on that box) didn't show the bug so far.
>>> One of the effects of turning on LPAE is that dma_addr_t and phys_addr_t
>>> turn to 64-bit, with long, int and void* still being 32-bit. Can you
>>> think of any issues that could be related to that?
>>>
>>> Also can you briefly sketch what that patch (578270bfbd) eventually
>>> changes? I see that the fix looks right, I am just wondering what the
>>> impact is: Do we get more blocks or less or bigger ones or smaller?
>>
>> Without the change, 'bvprvp' always points to 'bv', then each bio vector
>> can't be merged to other bio vector, so each bvec becomes one single
>> physical segment(convert to one single sg element in driver), finally the
>> transfer size for each bio becomes much smaller, and size of each
>> segment becomes much smaller, but segment number may become
>> bigger.
>>
>>>
>>> I will try to do more experiments and to find the real culprit.
>>
>> It may be helpful to enable 'block:*' trace events, and get/analyze the
>> traces close to the kernel warning.
>
> Good hint.
> I just enabled all block events, so it's a lot of data and I guess I
> didn't catch the actual "bug moment" before the buffer was overwritten.
> Do you know of any specific event that would be useful?

That is easy:

1) you can figure out the fault sector number from sata warning log, see
ata_eh_link_report, then all related trace can be extraced by the sector number

OR

2) just simply add one line trace_printk() in the function of printing
warning, which
can be thought as one timestamp in trace buffer.

>
> Anyway I see a _lot_ of these in there, even before the bug triggers:
>
> block_dirty_buffer: 8,7 sector=18446744073709486080 size=4096
> block_dirty_buffer: 8,8 sector=18446744073709486080 size=4096
>
> So that long number is 0xffffffffffff0000. Is that is some special value
> for struct buffer_head.b_blocknr?

I guess the above buffer_head isn't mapped yet, and the sector isn't valid.

> I see this in all versions, though, so with and without LPAE and on both
> 4.4-rc5 and with the patch in question reverted.
>
> The type of this variable is sector_t, which is u64 with LBDAF defined
> (which is enabled for me), but "unsigned long" without it.
>
> Does that ring a bell?
>
> Thanks,
> Andre.
>
>
>
>
>>
>>>
>>> Thanks,
>>> Andre.
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Andre.
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>> HEAD) on a Calxeda Midway (4*ARM Cortex-A15 (32-bit), 8GB RAM, SATA
>>>>>>> spinning disk or SSD).
>>>>>>> After some disk I/O load (kernel compile with -j6) I see the kernel
>>>>>>> screaming:
>>>>>>>
>>>>>>> [  103.736982] ata1.00: exception Emask 0x0 SAct 0x3ffff0 SErr 0x0
>>>>>>> action 0x6 frozen
>>>>>>> [  103.744476] ata1.00: failed command: WRITE FPDMA QUEUED
>>>>>>> [  103.749707] ata1.00: cmd 61/00:20:48:6b:41/08:00:0a:00:00/40 tag 4
>>>>>>> ncq 1048576 out
>>>>>>> [  103.749707]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
>>>>>>> 0x4 (timeout)
>>>>>>> [  103.764659] ata1.00: status: { DRDY }
>>>>>>> [  103.768321] ata1.00: failed command: WRITE FPDMA QUEUED
>>>>>>> [  103.773547] ata1.00: cmd 61/98:28:48:73:41/42:00:0a:00:00/40 tag 5
>>>>>>> ncq 8728576 out
>>>>>>> [  103.773547]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
>>>>>>> 0x4 (timeout)
>>>>>>> < repeated with increasing tag numbers>
>>>>>>>
>>>>>>> This repeats for a while, but then seems to recover later, though I
>>>>>>> haven't checked if there are more issues and rebooted instead to avoid
>>>>>>> filesystem damage.
>>>>>>>
>>>>>>> While I agree that this looks like a disk error on the first glance, I
>>>>>>> never saw this before 4.4-rc2, had the very same error on different
>>>>>>> nodes (with another spinning disk and even an SSD) and I can make it
>>>>>>> vanish by reverting the commit I identified after bisection:
>>>>>>>
>>>>>>> commit 578270bfbd2803dc7b0b03fbc2ac119efbc73195
>>>>>>> Author: Ming Lei <ming.lei@...onical.com>
>>>>>>> Date:   Tue Nov 24 10:35:29 2015 +0800
>>>>>>>
>>>>>>>     block: fix segment split
>>>>>>> ...
>>>>>>> I understand that this fix seems sane, but actually reverting it fixes
>>>>>>> the issue for me: 4.4-rc5 crashed within some minutes with the above
>>>>>>> log, 4.4-rc5 with 578270bfbd reverted survived 19 hours of continuous
>>>>>>> kernel compiles without issues.
>>>>>>> Looking at the git history of that file I see quite some recent changes
>>>>>>> there, but it's beyond my understanding of the code to spot the real
>>>>>>> culprit.
>>>>>>>
>>>>>>> Can anyone point me to a change in blk-merge.c I could try to revert to
>>>>>>> identify the real root cause? I can run tests quickly, though a real
>>>>>>> positive case would need some hours of runtime to be sure it's fine.
>>>>>>>
>>>>>>> Many thanks!
>>>>>>> Cheers,
>>>>>>> Andre.
>>>>>>> --
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-block" in
>>> the body of a message to majordomo@...r.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives