[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10b9d49db1598959f9bc9fc569c128a5ccc5cc5e.camel@linux.ibm.com>
Date: Mon, 30 Nov 2020 17:49:25 -0800
From: James Bottomley <jejb@...ux.ibm.com>
To: Ding Hui <dinghui@...gfor.com.cn>, dgilbert@...erlog.com,
martin.petersen@...cle.com
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
stable <stable@...r.kernel.org>
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer
On Mon, 2020-11-30 at 10:26 +0800, Ding Hui wrote:
[...]
> sg_ses -e
> Diagnostic pages, followed by abbreviation(s) then page code:
> Supported Diagnostic Pages [sdp] [0x0]
> Configuration (SES) [cf] [0x1]
> Enclosure Status/Control (SES) [ec,es] [0x2]
> Help Text (SES) [ht] [0x3]
>
> # sg_ses -p cf /dev/sdu
> DELL MD32xxi 0784
> disk device has EncServ bit set
> Configuration diagnostic page:
> number of secondary subenclosures: 0
> generation code: 0x12c
> enclosure descriptor list
> Subenclosure identifier: 0 (primary)
> relative ES process id: 0, number of ES processes: 0
> number of type descriptor headers: 5
> enclosure logical identifier (hex): 0000000000000000
> enclosure vendor: DELL product: MD32xxi rev:
> 0784
> vendor-specific data:
> 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
> 00 00 00 00
> type descriptor header/text list
> Element type: Device slot, subenclosure id: 0
> number of possible elements: 12
> Element type: Power supply, subenclosure id: 0
> number of possible elements: 2
> Element type: Cooling, subenclosure id: 0
> number of possible elements: 4
> Element type: Temperature sensor, subenclosure id: 0
> number of possible elements: 4
> Element type: SCC controller electronics, subenclosure id: 0
> number of possible elements: 1
>
> I'm not goot at ses, but it seems that ses does not get the right
> component count
Actually there is a possible explanation. Your kernel log has this in
the middle:
> 2020-11-30 09:31:41.360334 notice [kernel:] [425843.704480] sd
> 19:0:0:0: Embedded Enclosure Device
> 2020-11-30 09:31:41.360335 warning [kernel:] [425843.704732] sd
> 19:0:0:0: Mode parameters changed
That "Mode parameters changed" implies that what the kernel read first
time around may not be the actual configuration of the enclosure. In
particular, the generation code being 0x12c is also an indicator things
may have changed. My theory is when the kernel boots the device is
returning 0 for most of the possible elements, but it changes this at a
later stage. One way to verify would be to compile ses as modular but
blacklist it so it can't be inserted, then do sg_ses -p to get the
enclosure and then insert the ses module to see if the two agree on the
components. Unfortunately, even if that's successful, figuring out
what we have to do to the enclosure to get it to finish its internal
scanning may not be so easy.
James
Powered by blists - more mailing lists