linux-kernel - Re: [PATCH] driver core: Add log when devtmpfs create node failed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024052316-confused-payback-5658@gregkh>
Date: Thu, 23 May 2024 09:25:40 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: yangxingui <yangxingui@...wei.com>
Cc: rafael@...nel.org, linux-scsi@...r.kernel.org,
	linux-kernel@...r.kernel.org, linuxarm@...wei.com,
	prime.zeng@...ilicon.com, liyihang9@...wei.com,
	kangfenglong@...wei.com
Subject: Re: [PATCH] driver core: Add log when devtmpfs create node failed

On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
> Hi, Greg
> 
> On 2024/5/22 20:23, Greg KH wrote:
> > On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
> > > Currently, no exception information is output when devtmpfs create node
> > > failed, so add log info for it.
> > 
> > Why?  Who is going to do something with this?
> We execute the lsscsi command after the disk is connected, we occasionally
> find that some disks do not have dev nodes and these disks cannot be used.

Ok, but why do you think that devtmpfs create failed?

> However, there is no abnormal log output during disk scanning. We analyze
> that it may be caused by the failure of devtmpfs create dev node, so the log
> is added here.

But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
instead?

> The lscsi command query results and kernel logs as follows:
> 
> [root@...alhost]# lsscsi
> [9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-
> 
> kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> link_rate=10(sata)
> kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
> (5000000000000900)
> kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
> kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
> kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
> kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
> phy_state=0xe
> kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
> down
> kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
> link_rate=10(sata)
> kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
> UDMA/133
> kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
> (depth 32), AA
> kernel: [586670.027471] ata19.00: configured for UDMA/133
> kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
> 0 tries: 1
> kernel: [586670.037541] sas: ata19: end_device-9:0:
> model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
> kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
> SN05 PQ: 0 ANSI: 5
> kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
> blocks: (10.0 TB/9.10 TiB)
> kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
> kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
> kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
> kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
> kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
> already(1)!
> kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk

Looks like sdk was found properly, what's the problem?

> 
> > 
> > > 
> > > Signed-off-by: Xingui Yang <yangxingui@...wei.com>
> > > ---
> > >   drivers/base/core.c | 5 ++++-
> > >   1 file changed, 4 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > > index 5f4e03336e68..32a41e0472b2 100644
> > > --- a/drivers/base/core.c
> > > +++ b/drivers/base/core.c
> > > @@ -3691,7 +3691,10 @@ int device_add(struct device *dev)
> > >   		if (error)
> > >   			goto SysEntryError;
> > > -		devtmpfs_create_node(dev);
> > > +		error = devtmpfs_create_node(dev);
> > > +		if (error)
> > > +			pr_info("devtmpfs create node for %s failed: %d\n",
> > > +				dev_name(dev), error);
> > 
> > Why is an error message pr_info()?
> Do you recommend using pr_err()?

Do not print errors at the information level :)

> > And again, why is this needed?  If this needs to be checked, why are you
> > now checking it but ignoring the error?
> > 
> > What would this help with?
> As above, we want to get the error info when the dev node fails to be
> created. We currently haven't figured out how to handle this exception well.
> But judging from the problems we are currently encountering, some may be
> because the corresponding dev node already exists, causing the creation to
> fail, but the node information is incorrect and the device cannot be used.
> as follows:
> [root@...alhost]# ll /dev/sdk
> -rw-------. 1 root root 5368709120 Jul 8 09:51 /dev/sdk

Looks like the device node is created to me.  What is incorrect about
it, the values?  What is 'll' an alias for?  And are you sure that other
tools aren't getting the device node creation uevent and doing something
with it in userspace?  How do you know this is the kernel failing?

Wait, is /dev/sdk really a device node and not a file?  Perhaps
something else wrote to it first, before it was created?  And that's why
devtmpfs couldn't create it.  That sounds like a userspace error,
nothing the kernel can do about it.

thanks,

greg k-h