lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 May 2024 19:01:40 +0800
From: yangxingui <yangxingui@...wei.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: <rafael@...nel.org>, <linux-scsi@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linuxarm@...wei.com>,
	<prime.zeng@...ilicon.com>, <liyihang9@...wei.com>, <kangfenglong@...wei.com>
Subject: Re: [PATCH] driver core: Add log when devtmpfs create node failed



On 2024/5/23 17:35, Greg KH wrote:
> On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote:
>> Hi Greg,
>>
>> On 2024/5/23 15:25, Greg KH wrote:
>>> On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
>>>> Hi, Greg
>>>>
>>>> On 2024/5/22 20:23, Greg KH wrote:
>>>>> On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
>>>>>> Currently, no exception information is output when devtmpfs create node
>>>>>> failed, so add log info for it.
>>>>>
>>>>> Why?  Who is going to do something with this?
>>>> We execute the lsscsi command after the disk is connected, we occasionally
>>>> find that some disks do not have dev nodes and these disks cannot be used.
>>>
>>> Ok, but why do you think that devtmpfs create failed?
>> I found that lsscsi will traverse the dev node and obtain device major and
>> min. If no matching dev node is found, it will display "-       ".
>>>
>>>> However, there is no abnormal log output during disk scanning. We analyze
>>>> that it may be caused by the failure of devtmpfs create dev node, so the log
>>>> is added here.
>>>
>>> But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
>>> instead?
>> My subsequent reply touches on these points.
>>>
>>>> The lscsi command query results and kernel logs as follows:
>>>>
>>>> [root@...alhost]# lsscsi
>>>> [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
>>>>
>>>> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>>>> link_rate=10(sata)
>>>> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
>>>> (5000000000000900)
>>>> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
>>>> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
>>>> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
>>>> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
>>>> phy_state=0xe
>>>> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
>>>> down
>>>> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
>>>> link_rate=10(sata)
>>>> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
>>>> UDMA/133
>>>> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
>>>> (depth 32), AA
>>>> kernel: [586670.027471] ata19.00: configured for UDMA/133
>>>> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
>>>> 0 tries: 1
>>>> kernel: [586670.037541] sas: ata19: end_device-9:0:
>>>> model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
>>>> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
>>>> SN05 PQ: 0 ANSI: 5
>>>> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
>>>> blocks: (10.0 TB/9.10 TiB)
>>>> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
>>>> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
>>>> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
>>>> enabled, doesn't support DPO or FUA
>>>> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
>>>> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
>>>> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
>>>> already(1)!
>>>> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk
>>>
>>> Looks like sdk was found properly, what's the problem?
>>
>> Yes, this problem occurs occasionally. There is no exception log when
>> scanning the disk, but the disk cannot be used. It has been confirmed that
>> it is related to fio testing. When the dev node does not exist, fio may
>> actively create this file.
> 
> So that's a userspace issue.  If a device node is to be created, and the
> file is already present with that name, yes, we will fail to create it
> as obviously userspace did not want us to do so.
> 
> It's not the kernel's job to protect userspace from doing foolish things
> itself, right?  :)
Yes.
> 
>> If we want to solve this problem, should we delete the existing files first
>> when creating a dev node?
> 
> No.
Ok.
> 
>> Or just print a prompt indicating that the dev node creation failed.
> 
> We can do that, but will that cause error messages to be printed out for
> normal situations today where userspace does this on purpose?
> 
> Again, this isn't fixing the root problem here (which is userspace doing
> something it shouldn't be doing), adding kernel log messages might be
> just noise at this point in time given that it has been operating this
> way for many years, if not decades.
Yes, there is currently no fix for the problem, and it doesn't usually 
happen. Once it occurs, the device will be unavailable and difficult to 
locate. In addition, there are many possibilities for the failure of 
devtmpfs to create a dev node, including currently recognized scenarios 
and memory allocation failures, etc.

Thanks,
Xingui
.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ