lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240708214656.4721-1-Ashish.Kalra@amd.com>
Date: Mon, 8 Jul 2024 21:46:56 +0000
From: Ashish Kalra <Ashish.Kalra@....com>
To: <pstanner@...hat.com>
CC: <airlied@...il.com>, <bhelgaas@...gle.com>, <dakr@...hat.com>,
	<daniel@...ll.ch>, <dri-devel@...ts.freedesktop.org>, <hdegoede@...hat.com>,
	<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
	<maarten.lankhorst@...ux.intel.com>, <mripard@...nel.org>,
	<sam@...nborg.org>, <tzimmermann@...e.de>, <thomas.lendacky@....com>,
	<mario.limonciello@....com>
Subject: Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

With this patch applied, we are observing unloading and then reloading issues with the AMD Crypto (CCP) driver:

with DEVRES logging enabled, we observe the following logs:

[  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
[  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
[  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
[  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
[  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7 0xffff8d09d588f680 pcim_msi_release (16 bytes)
[  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a 0xffff8d09dc199680 msi_device_data_release (80 bytes)
[  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
[  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
[  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
[  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
[  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88 0xffff8d09d3ac2700 pcim_release (0 bytes)
[  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
[  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
[  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
[  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
[  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
[  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
[  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
[  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
[  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
[  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
[  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
[  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
[  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
[  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2 0xffff8cfaa528d940 pcim_release (0 bytes)
[  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
[  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)

and the CCP driver reload issue during driver probe:

[  226.552684] pci 0000:23:00.1: Resources present before probing
[  226.568846] pci 0000:a2:00.1: Resources present before probing

>From the above DEVRES logging, it looks like pcim_intx_restore associated resource is being released but then
being re-added during detach/unload, which causes really_probe() to fail at probe time, as dev->devres_head is
not empty due to this added resource:
...
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
...

Going more deep into this: 

This is the initial pcim_intx_resoure associated resource being added during first (CCP) driver load:

[   40.418933]  pcim_intx+0x3a/0x120
[   40.418936]  pci_intx+0x8b/0xa0
[   40.418939]  __pci_enable_msix_range+0x369/0x530
[   40.418943]  pci_enable_msix_range+0x18/0x20
[   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
[   40.418965] ipmi device interface
[   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
[   40.418969]  local_pci_probe+0x4f/0xb0
[   40.418973]  work_for_cpu_fn+0x1e/0x30
[   40.418976]  process_one_work+0x183/0x350
[   40.418980]  worker_thread+0x2df/0x3f0
[   40.418982]  ? __pfx_worker_thread+0x10/0x10
[   40.418985]  kthread+0xd0/0x100
[   40.418987]  ? __pfx_kthread+0x10/0x10
[   40.418990]  ret_from_fork+0x40/0x60
[   40.418993]  ? __pfx_kthread+0x10/0x10
[   40.418996]  ret_from_fork_asm+0x1a/0x30
[   40.419001]  </TASK>
..
..
[   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)

Now, at driver unload: 
devres_release_all() -> remove_nodes() -> release_nodes() ...

remove_nodes() moves normal devres entries to the todo list, as can be seen with the following log:
...
[  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from devres to todo list
...

So, now this pcim_intx_resource associated resource is no longer part of dev->devres_head list and has been
moved to the todo list.

Later, when release_nodes() is invoked, it calls the associated release() callback associated with this devres:
...
[  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
...

The call flow for that is:
pcim_intx_restore() -> pci_intx() -> pcim_intx() ...

Now, pcim_intx() calls get_or_create_intx_devres() which tries to find it's associated devres using devres_find(), but 
that fails to find the devres, as the devres is no longer on dev->devres_head and has been moved to todo list.

Therefore, get_or_create_intx_devres() adds a new devres at driver unload/detach time:
...
[  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
...

But, then this is an issue as pcim_intx() is supposed to restore the original PCI INTx state on driver detach, but it now
operating on a newly added devres and not the original devres (added at driver probe) which contains the original PCI INTx
state, so it will be restoring an incorrect PCI INTx state ?

Additionally, this newly added devres causes driver reload/probe failure as really_probe() now finds resources present
before probing.

Not sure, if this issue has been observed with other PCI device drivers.

Thanks,
Ashish

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ