[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <827aceff-28ed-4922-b841-b7dd06c082b1@oracle.com>
Date: Thu, 30 Jan 2025 10:57:17 +0530
From: Harshvardhan Jha <harshvardhan.j.jha@...cle.com>
To: Stefano Stabellini <sstabellini@...nel.org>,
Jürgen Groß <jgross@...e.com>
Cc: Greg KH <gregkh@...uxfoundation.org>,
Konrad Wilk
<konrad.wilk@...cle.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>,
stable@...r.kernel.org
Subject: Re: v5.4.289 failed to boot with error megasas_build_io_fusion 3219
sge_count (-12) is out of range
On 30/01/25 3:31 AM, Stefano Stabellini wrote:
> On Wed, 29 Jan 2025, Jürgen Groß wrote:
>> On 29.01.25 19:35, Harshvardhan Jha wrote:
>>> On 29/01/25 4:52 PM, Juergen Gross wrote:
>>>> On 29.01.25 10:15, Harshvardhan Jha wrote:
>>>>> On 29/01/25 2:34 PM, Greg KH wrote:
>>>>>> On Wed, Jan 29, 2025 at 02:29:48PM +0530, Harshvardhan Jha wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> On 29/01/25 2:18 PM, Greg KH wrote:
>>>>>>>> On Wed, Jan 29, 2025 at 02:13:34PM +0530, Harshvardhan Jha wrote:
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> On 29/01/25 2:05 PM, Greg KH wrote:
>>>>>>>>>> On Wed, Jan 29, 2025 at 02:03:51PM +0530, Harshvardhan Jha
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> +stable
>>>>>>>>>>>
>>>>>>>>>>> There seems to be some formatting issues in my log output. I
>>>>>>>>>>> have
>>>>>>>>>>> attached it as a file.
>>>>>>>>>> Confused, what are you wanting us to do here in the stable
>>>>>>>>>> tree?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> greg k-h
>>>>>>>>> Since, this is reproducible on 5.4.y I have added stable. The
>>>>>>>>> culprit
>>>>>>>>> commit which upon getting reverted fixes this issue is also
>>>>>>>>> present in
>>>>>>>>> 5.4.y stable.
>>>>>>>> What culprit commit? I see no information here :(
>>>>>>>>
>>>>>>>> Remember, top-posting is evil...
>>>>>>> My apologies,
>>>>>>>
>>>>>>> The stable tag v5.4.289 seems to fail to boot with the following
>>>>>>> prompt in an infinite loop:
>>>>>>> [ 24.427217] megaraid_sas 0000:65:00.0: megasas_build_io_fusion
>>>>>>> 3273 sge_count (-12) is out of range. Range is: 0-256
>>>>>>>
>>>>>>> Reverting the following patch seems to fix the issue:
>>>>>>>
>>>>>>> stable-5.4 : v5.4.285 - 5df29a445f3a xen/swiotlb:
>>>>>>> add
>>>>>>> alignment check for dma buffers
>>>>>>>
>>>>>>> I tried changing swiotlb grub command line arguments but that didn't
>>>>>>> seem to help much unfortunately and the error was seen again.
>>>>>>>
>>>>>> Ok, can you submit this revert with the information about why it
>>>>>> should
>>>>>> not be included in the 5.4.y tree and cc: everyone involved and then
>>>>>> we
>>>>>> will be glad to queue it up.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> greg k-h
>>>>> This might be reproducible on other stable trees and mainline as well so
>>>>> we will get it fixed there and I will submit the necessary fix to stable
>>>>> when everything is sorted out on mainline.
>>>> Right. Just reverting my patch will trade one error with another one (the
>>>> one which triggered me to write the patch).
>>>>
>>>> There are two possible ways to fix the issue:
>>>>
>>>> - allow larger DMA buffers in xen/swiotlb (today 2MB are the max.
>>>> supported
>>>> size, the megaraid_sas driver seems to effectively request 4MB)
>>> This seems relatively simpler to implement but I'm not sure whether it's
>>> the most optimal approach
>> Just making the static array larger used to hold the frame numbers for the
>> buffer seems to be a waste of memory for most configurations.
>>
>> I'm thinking of an allocated array using the max needed size (replace a
>> former buffer with a larger one if needed).
> You are referring to discontig_frames and MAX_CONTIG_ORDER in
> arch/x86/xen/mmu_pv.c, right? I am not super familiar with that code but
> it looks like a good way to go.
This rejected patch works on MAX_CONTIG_ORDER and doubles the buffer
size but that is undesirable in most situations:
https://lore.kernel.org/lkml/28947d4f-ab32-4a57-8dbb-e37fa4183a69@suse.com/t/
What needs to be done is the buffer size will only be doubled when needed.
Harshvardhan
Powered by blists - more mailing lists