lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec1a28a9-cef1-45e3-9b71-30de5bf07d7f@suse.com>
Date: Thu, 30 Jan 2025 07:59:18 +0100
From: Jürgen Groß <jgross@...e.com>
To: Harshvardhan Jha <harshvardhan.j.jha@...cle.com>,
 Stefano Stabellini <sstabellini@...nel.org>
Cc: Greg KH <gregkh@...uxfoundation.org>, Konrad Wilk
 <konrad.wilk@...cle.com>, Boris Ostrovsky <boris.ostrovsky@...cle.com>,
 "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>,
 stable@...r.kernel.org
Subject: Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219
 sge_count (-12) is out of range

On 30.01.25 06:27, Harshvardhan Jha wrote:
> 
> On 30/01/25 3:31 AM, Stefano Stabellini wrote:
>> On Wed, 29 Jan 2025, Jürgen Groß wrote:
>>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>>> Hi Greg,
>>>>>>>>
>>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> +stable
>>>>>>>>>>>>
>>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>>> have
>>>>>>>>>>>> attached it as a file.
>>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>>> tree?
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> greg k-h
>>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>>> culprit
>>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>>> present in
>>>>>>>>>> 5.4.y stable.
>>>>>>>>> What culprit commit?  I see no information here :(
>>>>>>>>>
>>>>>>>>> Remember, top-posting is evil...
>>>>>>>> My apologies,
>>>>>>>>
>>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>>> prompt in an infinite loop:
>>>>>>>> [   24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>>> 3273 sge_count (-12) is out of range. Range is:  0-256
>>>>>>>>
>>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>>
>>>>>>>> stable-5.4      : v5.4.285             - 5df29a445f3a xen/swiotlb:
>>>>>>>> add
>>>>>>>> alignment check for dma buffers
>>>>>>>>
>>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>>
>>>>>>> Ok, can you submit this revert with the information about why it
>>>>>>> should
>>>>>>> not be included in the 5.4.y tree and cc: everyone involved and then
>>>>>>> we
>>>>>>> will be glad to queue it up.
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>> This might be reproducible on other stable trees and mainline as well so
>>>>>> we will get it fixed there and I will submit the necessary fix to stable
>>>>>> when everything is sorted out on mainline.
>>>>> Right. Just reverting my patch will trade one error with another one (the
>>>>> one which triggered me to write the patch).
>>>>>
>>>>> There are two possible ways to fix the issue:
>>>>>
>>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>>> supported
>>>>>     size, the megaraid_sas driver seems to effectively request 4MB)
>>>> This seems relatively simpler to implement but I'm not sure whether it's
>>>> the most optimal approach
>>> Just making the static array larger used to hold the frame numbers for the
>>> buffer seems to be a waste of memory for most configurations.
>>>
>>> I'm thinking of an allocated array using the max needed size (replace a
>>> former buffer with a larger one if needed).
>> You are referring to discontig_frames and MAX_CONTIG_ORDER in
>> arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
>> it looks like a good way to go.
> 
> This rejected patch works on MAX_CONTIG_ORDER and doubles the buffer
> size but that is undesirable in most situations:
> 
> https://lore.kernel.org/lkml/28947d4f-ab32-4a57-8dbb-e37fa4183a69@suse.com/t/
> 
> What needs to be done is the buffer size will only be doubled when needed.

I'll write a patch.


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3684 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ