[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8fbbe4de-13f0-6c99-98f9-5c47c8251a9f@sangfor.com.cn>
Date: Mon, 30 Nov 2020 10:26:08 +0800
From: Ding Hui <dinghui@...gfor.com.cn>
To: dgilbert@...erlog.com, jejb@...ux.ibm.com,
martin.petersen@...cle.com
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
stable <stable@...r.kernel.org>
Subject: Re: [PATCH] scsi: ses: Fix crash caused by kfree an invalid pointer
On 2020/11/29 13:12, Douglas Gilbert wrote:
> On 2020-11-28 6:27 p.m., James Bottomley wrote:
>> On Sat, 2020-11-28 at 20:23 +0800, Ding Hui wrote:
>>> We can get a crash when disconnecting the iSCSI session,
>>> the call trace like this:
>>>
>>> [ffff00002a00fb70] kfree at ffff00000830e224
>>> [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
>>> [ffff00002a00fbd0] device_del at ffff0000086b6a98
>>> [ffff00002a00fc50] device_unregister at ffff0000086b6d58
>>> [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
>>> [ffff00002a00fca0] scsi_remove_device at ffff000008706134
>>> [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
>>> [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
>>> [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
>>> [ffff00002a00fdb0] process_one_work at ffff00000810f35c
>>> [ffff00002a00fe00] worker_thread at ffff00000810f648
>>> [ffff00002a00fe70] kthread at ffff000008116e98
>>>
>>> In ses_intf_add, components count could be 0, and kcalloc 0 size
>>> scomp,
>>> but not saved in edev->component[i].scratch
>>>
>>> In this situation, edev->component[0].scratch is an invalid pointer,
>>> when kfree it in ses_intf_remove_enclosure, a crash like above would
>>> happen
>>> The call trace also could be other random cases when kfree cannot
>>> catch
>>> the invalid pointer
>>>
>>> We should not use edev->component[] array when the components count
>>> is 0
>>> We also need check index when use edev->component[] array in
>>> ses_enclosure_data_process
>>>
>>> Tested-by: Zeng Zhicong <timmyzeng@....com>
>>> Cc: stable <stable@...r.kernel.org> # 2.6.25+
>>> Signed-off-by: Ding Hui <dinghui@...gfor.com.cn>
>>
>> This doesn't really look to be the right thing to do: an enclosure
>> which has no component can't usefully be controlled by the driver since
>> there's nothing for it to do, so what we should do in this situation is
>> refuse to attach like the proposed patch below.
>>
>> It does seem a bit odd that someone would build an enclosure that
>> doesn't enclose anything, so would you mind running
>>
>> sg_ses -e
>
> '-e' is the short form of '--enumerate'. That will report the names
> and abbreviations of the diagnostic pages that the utility itself
> knows about (and supports). It won't show anything specific about
> the environment that sg_ses is executed in.
>
> You probably meant:
> sg_ses <ses_device>
>
> Examples of the likely forms are:
> sg_ses /dev/bsg/1:0:0:0
> sg_ses /dev/sg2
> sg_ses /dev/ses0
>
> This from a nearby machine:
>
> $ lsscsi -gs
> [3:0:0:0] disk ATA Samsung SSD 850 1B6Q /dev/sda /dev/sg0
> 120GB
> [4:0:0:0] disk IBM-207x HUSMM8020ASS20 J4B6 /dev/sdc /dev/sg2
> 200GB
> [4:0:1:0] disk ATA INTEL SSDSC2KW25 003C /dev/sdd /dev/sg3
> 256GB
> [4:0:2:0] disk SEAGATE ST10000NM0096 E005 /dev/sde /dev/sg4
> 10.0TB
> [4:0:3:0] enclosu Areca Te ARC-802801.37.69 0137 -
> /dev/sg5 -
> [4:0:4:0] enclosu Intel RES2SV240 0d00 -
> /dev/sg6 -
> [7:0:0:0] disk Kingston DataTravelerMini PMAP /dev/sdb /dev/sg1
> 1.03GB
> [N:0:0:1] disk WDC WDS256G1X0C-00ENX0__1 /dev/nvme0n1 -
> 256GB
>
> # sg_ses /dev/sg5
> Areca Te ARC-802801.37.69 0137
> Supported diagnostic pages:
> Supported Diagnostic Pages [sdp] [0x0]
> Configuration (SES) [cf] [0x1]
> Enclosure Status/Control (SES) [ec,es] [0x2]
> String In/Out (SES) [str] [0x4]
> Threshold In/Out (SES) [th] [0x5]
> Element Descriptor (SES) [ed] [0x7]
> Additional Element Status (SES-2) [aes] [0xa]
> Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
> Download Microcode (SES-2) [dm] [0xe]
> Subenclosure Nickname (SES-2) [snic] [0xf]
> Protocol Specific (SAS transport) [] [0x3f]
>
> # sg_ses -p cf /dev/sg5
> Areca Te ARC-802801.37.69 0137
> Configuration diagnostic page:
> number of secondary subenclosures: 0
> generation code: 0x0
> enclosure descriptor list
> Subenclosure identifier: 0 [primary]
> relative ES process id: 1, number of ES processes: 1
> number of type descriptor headers: 9
> enclosure logical identifier (hex): d5b401503fc0ec16
> enclosure vendor: Areca Te product: ARC-802801.37.69 rev: 0137
> vendor-specific data:
> 11 22 33 44 55 00 00 00 ."3DU...
>
> type descriptor header and text list
> Element type: Array device slot, subenclosure id: 0
> number of possible elements: 24
> text: ArrayDevicesInSubEnclsr0
> Element type: Enclosure, subenclosure id: 0
> number of possible elements: 1
> text: EnclosureElementInSubEnclsr0
> Element type: SAS expander, subenclosure id: 0
> number of possible elements: 1
> text: SAS Expander
> Element type: Cooling, subenclosure id: 0
> number of possible elements: 5
> text: CoolingElementInSubEnclsr0
> Element type: Temperature sensor, subenclosure id: 0
> number of possible elements: 2
> text: TempSensorsInSubEnclsr0
> Element type: Voltage sensor, subenclosure id: 0
> number of possible elements: 2
> text: VoltageSensorsInSubEnclsr0
> Element type: SAS connector, subenclosure id: 0
> number of possible elements: 3
> text: ConnectorsInSubEnclsr0
> Element type: Power supply, subenclosure id: 0
> number of possible elements: 2
> text: PowerSupplyInSubEnclsr0
> Element type: Audible alarm, subenclosure id: 0
> number of possible elements: 1
> text: AudibleAlarmInSubEnclsr0
>
> Doug Gilbert
>
>> on it and reporting back what it shows? It's possible there's another
>> type that the enclosure device should be tracking.
kernel log:
2020-11-30 09:29:44.228339 info [kernel:] [425726.567579] scsi host18:
iSCSI Initiator over TCP/IP
2020-11-30 09:29:44.476319 notice [kernel:] [425726.817417] scsi
18:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.480314 notice [kernel:] [425726.820591] scsi
18:0:0:0: rdac: LUN 0 (IOSHIP) (owned)
2020-11-30 09:29:44.480319 notice [kernel:] [425726.820810] sd 18:0:0:0:
Attached scsi generic sg30 type 0
2020-11-30 09:29:44.480320 notice [kernel:] [425726.820812] sd 18:0:0:0:
Embedded Enclosure Device
2020-11-30 09:29:44.480321 warning [kernel:] [425726.821119] sd
18:0:0:0: Mode parameters changed
2020-11-30 09:29:44.492316 info [kernel:] [425726.831444] sd 18:0:0:0:
[ses_intf_add]:777 sdev: ffff8027ca170000 edev:ffff80271459cc00 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:29:44.492326 notice [kernel:] [425726.832326] sd 18:0:0:0:
[sdt] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
2020-11-30 09:29:44.496308 notice [kernel:] [425726.836252] sd 18:0:0:0:
[sdt] Write Protect is off
2020-11-30 09:29:44.496313 debug [kernel:] [425726.836255] sd 18:0:0:0:
[sdt] Mode Sense: 83 00 10 08
2020-11-30 09:29:44.500310 notice [kernel:] [425726.838436] sd 18:0:0:0:
[sdt] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:29:44.500315 notice [kernel:] [425726.838573] scsi
18:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.501437 notice [kernel:] [425726.842314] scsi
18:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
2020-11-30 09:29:44.501442 notice [kernel:] [425726.842501] sd 18:0:0:1:
Attached scsi generic sg31 type 0
2020-11-30 09:29:44.501442 notice [kernel:] [425726.842502] sd 18:0:0:1:
Embedded Enclosure Device
2020-11-30 09:29:44.501443 warning [kernel:] [425726.842688] sd
18:0:0:1: Mode parameters changed
2020-11-30 09:29:44.512325 info [kernel:] [425726.852905] sd 18:0:0:1:
[ses_intf_add]:777 sdev: ffff8027ca162000 edev:ffff80271459d400 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:29:44.512335 notice [kernel:] [425726.853249] sd 18:0:0:1:
[sdu] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
2020-11-30 09:29:44.514333 notice [kernel:] [425726.853778] sd 18:0:0:1:
[sdu] Write Protect is off
2020-11-30 09:29:44.514338 debug [kernel:] [425726.853780] sd 18:0:0:1:
[sdu] Mode Sense: 83 00 10 08
2020-11-30 09:29:44.514339 notice [kernel:] [425726.853908] scsi
18:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5
2020-11-30 09:29:44.514340 notice [kernel:] [425726.854524] sd 18:0:0:1:
[sdu] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:29:44.514340 notice [kernel:] [425726.855140] sd 18:0:0:0:
[sdt] Attached SCSI disk
2020-11-30 09:29:44.514341 notice [kernel:] [425726.855221] scsi
18:0:0:31: Attached scsi generic sg32 type 0
2020-11-30 09:29:44.514342 notice [kernel:] [425726.855223] scsi
18:0:0:31: Embedded Enclosure Device
2020-11-30 09:29:44.514383 warning [kernel:] [425726.855571] scsi
18:0:0:31: Power-on or device reset occurred
2020-11-30 09:29:44.514383 err [kernel:] [425726.855587] scsi 18:0:0:31:
Failed to get diagnostic page 0x1
2020-11-30 09:29:44.514384 err [kernel:] [425726.855594] scsi 18:0:0:31:
Failed to bind enclosure -19
2020-11-30 09:29:44.514388 info [kernel:] [425726.855599] scsi
18:0:0:31: sdev: ffff8027ca173000
2020-11-30 09:29:44.524329 notice [kernel:] [425726.863747] sd 18:0:0:1:
[sdu] Attached SCSI disk
2020-11-30 09:29:44.764335 err [kernel:] [425727.103405] I/O scheduler
mq-deadline not found
2020-11-30 09:29:44.888323 err [kernel:] [425727.228994] I/O scheduler
mq-deadline not found
2020-11-30 09:29:45.001274 warning [kernel:] [425727.342284]
device-mapper: multipath round-robin: repeat_count > 1 is deprecated,
using 1 instead
2020-11-30 09:31:41.116334 info [kernel:] [425843.460697] scsi host19:
iSCSI Initiator over TCP/IP
2020-11-30 09:31:41.356325 notice [kernel:] [425843.700556] scsi
19:0:0:0: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.360325 notice [kernel:] [425843.704270] scsi
19:0:0:0: rdac: LUN 0 (IOSHIP) (owned)
2020-11-30 09:31:41.360334 notice [kernel:] [425843.704478] sd 19:0:0:0:
Attached scsi generic sg33 type 0
2020-11-30 09:31:41.360334 notice [kernel:] [425843.704480] sd 19:0:0:0:
Embedded Enclosure Device
2020-11-30 09:31:41.360335 warning [kernel:] [425843.704732] sd
19:0:0:0: Mode parameters changed
2020-11-30 09:31:41.380310 info [kernel:] [425843.723464] sd 19:0:0:0:
[ses_intf_add]:777 sdev: ffff8027c8984000 edev:ffff802723f68400 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:31:41.380317 notice [kernel:] [425843.724232] sd 19:0:0:0:
[sdv] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
2020-11-30 09:31:41.384308 notice [kernel:] [425843.729742] sd 19:0:0:0:
[sdv] Write Protect is off
2020-11-30 09:31:41.384313 debug [kernel:] [425843.729745] sd 19:0:0:0:
[sdv] Mode Sense: 83 00 10 08
2020-11-30 09:31:41.388310 notice [kernel:] [425843.730436] sd 19:0:0:0:
[sdv] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:31:41.388316 notice [kernel:] [425843.730557] scsi
19:0:0:1: Direct-Access DELL MD32xxi 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.392312 notice [kernel:] [425843.736948] scsi
19:0:0:1: rdac: LUN 1 (IOSHIP) (owned)
2020-11-30 09:31:41.392317 notice [kernel:] [425843.737146] sd 19:0:0:1:
Attached scsi generic sg34 type 0
2020-11-30 09:31:41.392317 notice [kernel:] [425843.737148] sd 19:0:0:1:
Embedded Enclosure Device
2020-11-30 09:31:41.392318 warning [kernel:] [425843.737829] sd
19:0:0:1: Mode parameters changed
2020-11-30 09:31:41.404316 info [kernel:] [425843.749482] sd 19:0:0:1:
[ses_intf_add]:777 sdev: ffff8027c899e000 edev:ffff802723f6e800 com:
0 edev->component[0].scratch: 0000000000000000
2020-11-30 09:31:41.404326 notice [kernel:] [425843.749862] sd 19:0:0:1:
[sdw] 104857600 512-byte logical blocks: (53.7 GB/50.0 GiB)
2020-11-30 09:31:41.408331 notice [kernel:] [425843.750597] sd 19:0:0:1:
[sdw] Write Protect is off
2020-11-30 09:31:41.408346 debug [kernel:] [425843.750598] sd 19:0:0:1:
[sdw] Mode Sense: 83 00 10 08
2020-11-30 09:31:41.408347 notice [kernel:] [425843.751325] scsi
19:0:0:31: Direct-Access DELL Universal Xport 0784 PQ: 0 ANSI: 5
2020-11-30 09:31:41.408347 notice [kernel:] [425843.751979] sd 19:0:0:1:
[sdw] Write cache: enabled, read cache: enabled, supports DPO and FUA
2020-11-30 09:31:41.408348 notice [kernel:] [425843.753105] scsi
19:0:0:31: Attached scsi generic sg35 type 0
2020-11-30 09:31:41.408349 notice [kernel:] [425843.753109] scsi
19:0:0:31: Embedded Enclosure Device
2020-11-30 09:31:41.408350 notice [kernel:] [425843.753165] sd 19:0:0:0:
[sdv] Attached SCSI disk
2020-11-30 09:31:41.408351 warning [kernel:] [425843.753377] scsi
19:0:0:31: Power-on or device reset occurred
2020-11-30 09:31:41.408352 err [kernel:] [425843.753388] scsi 19:0:0:31:
Failed to get diagnostic page 0x1
2020-11-30 09:31:41.408352 err [kernel:] [425843.753390] scsi 19:0:0:31:
Failed to bind enclosure -19
2020-11-30 09:31:41.408353 info [kernel:] [425843.753391] scsi
19:0:0:31: sdev: ffff8027c899c000
2020-11-30 09:31:41.416332 notice [kernel:] [425843.759795] sd 19:0:0:1:
[sdw] Attached SCSI disk
# cat /sys/class/enclosure/18\:0\:0\:0/components
0
# cat /sys/class/enclosure/18\:0\:0\:1/components
0
# cat /sys/class/enclosure/19\:0\:0\:0/components
0
# cat /sys/class/enclosure/19\:0\:0\:1/components
0
# sg_ses -e
Diagnostic pages, followed by abbreviation(s) then page code:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Enclosure Status/Control (SES) [ec,es] [0x2]
Help Text (SES) [ht] [0x3]
String In/Out (SES) [str] [0x4]
Threshold In/Out (SES) [th] [0x5]
Array Status/Control (SES, obsolete) [ac,as] [0x6]
Element Descriptor (SES) [ed] [0x7]
Short Enclosure Status (SES) [ses] [0x8]
Enclosure Busy (SES-2) [eb] [0x9]
Additional Element Status (SES-2) [aes] [0xa]
Subenclosure Help Text (SES-2) [sht] [0xb]
Subenclosure String In/Out (SES-2) [sstr] [0xc]
Supported SES Diagnostic Pages (SES-2) [ssp] [0xd]
Download Microcode (SES-2) [dm] [0xe]
Subenclosure Nickname (SES-2) [snic] [0xf]
Protocol Specific (SAS transport) [] [0x3f]
Translate Address (SBC) [] [0x40]
Device Status (SBC) [] [0x41]
Rebuild Assist (SBC) [] [0x42]
SES element type names, followed by abbreviation and element type code:
Unspecified [un] [0x0]
Device slot [dev] [0x1]
Power supply [ps] [0x2]
Cooling [coo] [0x3]
Temperature sensor [ts] [0x4]
Door [do] [0x5]
Audible alarm [aa] [0x6]
Enclosure services controller electronics [esc] [0x7]
SCC controller electronics [sce] [0x8]
Nonvolatile cache [nc] [0x9]
Invalid operation reason [ior] [0xa]
Uninterruptible power supply [ups] [0xb]
Display [dis] [0xc]
Key pad entry [kpe] [0xd]
Enclosure [enc] [0xe]
SCSI port/transceiver [sp] [0xf]
Language [lan] [0x10]
Communication port [cp] [0x11]
Voltage sensor [vs] [0x12]
Current sensor [cs] [0x13]
SCSI target port [stp] [0x14]
SCSI initiator port [sip] [0x15]
Simple subenclosure [ss] [0x16]
Array device slot [arr] [0x17]
SAS expander [sse] [0x18]
SAS connector [ssc] [0x19]
# sg_ses /dev/sdu
DELL MD32xxi 0784
disk device has EncServ bit set
Supported diagnostic pages:
Supported Diagnostic Pages [sdp] [0x0]
Configuration (SES) [cf] [0x1]
Enclosure Status/Control (SES) [ec,es] [0x2]
Array Status/Control (SES, obsolete) [ac,as] [0x6]
# sg_ses -p cf /dev/sdu
DELL MD32xxi 0784
disk device has EncServ bit set
Configuration diagnostic page:
number of secondary subenclosures: 0
generation code: 0x12c
enclosure descriptor list
Subenclosure identifier: 0 (primary)
relative ES process id: 0, number of ES processes: 0
number of type descriptor headers: 5
enclosure logical identifier (hex): 0000000000000000
enclosure vendor: DELL product: MD32xxi rev: 0784
vendor-specific data:
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
00 00 00 00
type descriptor header/text list
Element type: Device slot, subenclosure id: 0
number of possible elements: 12
Element type: Power supply, subenclosure id: 0
number of possible elements: 2
Element type: Cooling, subenclosure id: 0
number of possible elements: 4
Element type: Temperature sensor, subenclosure id: 0
number of possible elements: 4
Element type: SCC controller electronics, subenclosure id: 0
number of possible elements: 1
I'm not goot at ses, but it seems that ses does not get the right
component count
>>
>> Regards,
>>
>> James
>>
>> ---8>8>8><8<8<8--------
>> From: James Bottomley <James.Bottomley@...senPartnership.com>
>> Subject: [PATCH] scsi: ses: don't attach if enclosure has no components
>>
>> An enclosure with no components can't usefully be operated by the
>> driver (since effectively it has nothing to manage), so report the
>> problem and don't attach. Not attaching also fixes an oops which
>> could occur if the driver tries to manage a zero component enclosure.
>>
>> Reported-by: Ding Hui <dinghui@...gfor.com.cn>
>> Cc: stable@...r.kernel.org
>> Signed-off-by: James Bottomley <James.Bottomley@...senPartnership.com>
>> ---
>> drivers/scsi/ses.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
>> index c2afba2a5414..9624298b9c89 100644
>> --- a/drivers/scsi/ses.c
>> +++ b/drivers/scsi/ses.c
>> @@ -690,6 +690,11 @@ static int ses_intf_add(struct device *cdev,
>> type_ptr[0] == ENCLOSURE_COMPONENT_ARRAY_DEVICE)
>> components += type_ptr[1];
>> }
>> + if (components == 0) {
>> + sdev_printk(KERN_ERR, sdev, "enclosure has no enumerated
>> components\n");
>> + goto err_free;
>> + }
>> +
>> ses_dev->page1 = buf;
>> ses_dev->page1_len = len;
>> buf = NULL;
>>
>
--
Thanks,
- Ding Hui
Powered by blists - more mailing lists