linux-kernel - Re: [PATCH] tpm: Fix kexec crash due to access to ops NULL pointer (powerpc)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9eafa8f-4006-4bc2-c09b-6b02a14c6ef3@linux.ibm.com>
Date:   Mon, 20 Dec 2021 20:05:58 -0500
From:   Stefan Berger <stefanb@...ux.ibm.com>
To:     Tyrel Datwyler <tyreld@...ux.ibm.com>, jarkko@...nel.org,
        peterhuewe@....de, linux-integrity@...r.kernel.org
Cc:     Korrapati.Likhitha@....com, pavrampu@...ibm.com,
        linux-kernel@...r.kernel.org, jgg@...pe.ca,
        linux-security-module@...r.kernel.org, gcwilson@...ibm.com,
        linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH] tpm: Fix kexec crash due to access to ops NULL pointer
 (powerpc)


On 12/20/21 19:39, Tyrel Datwyler wrote:
> On 12/11/21 5:28 PM, Stefan Berger wrote:
>> Fix the following crash on kexec by checking chip->ops for a NULL pointer
>> in tpm_chip_start() and returning an error code if this is the case.
>>
>> BUG: Kernel NULL pointer dereference on read at 0x00000060
>> Faulting instruction address: 0xc00000000099a06c
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> ...
>> NIP [c00000000099a06c] tpm_chip_start+0x2c/0x140
>>   LR [c00000000099a808] tpm_chip_unregister+0x108/0x170
>> Call Trace:
>> [c0000000188bfa00] [c000000002b03930] fw_devlink_strict+0x0/0x8 (unreliable)
>> [c0000000188bfa30] [c00000000099a808] tpm_chip_unregister+0x108/0x170
>> [c0000000188bfa70] [c0000000009a3874] tpm_ibmvtpm_remove+0x34/0x130
>> [c0000000188bfae0] [c000000000110dbc] vio_bus_remove+0x5c/0xb0
>> [c0000000188bfb20] [c0000000009bc154] device_shutdown+0x1d4/0x3a8
>> [c0000000188bfbc0] [c000000000196e14] kernel_restart_prepare+0x54/0x70
>>
>> The referenced patch below introduced a function to shut down the VIO bus.
>> The bus shutdown now calls tpm_del_char_device (via tpm_chip_unregister)
>> after a call to tpm_class_shutdown, which already set chip->ops to NULL.
>> The crash occurrs when tpm_del_char_device calls tpm_chip_start with the
>> chip->ops NULL pointer.
> It was unclear to me at first, but IIUC the problem is the ibmvtpm device
> receives two shutdown calls, the first is a class shutdown call for TPM devices,
> followed by a bus shutdown call for VIO devices.
>
> It appears that the class shutdown routines are meant to be a pre-shutdown
> routine as they are defined as class->shutdown_pre(), and it is clearly allowed
> to call class->shutdown_pre() followed by one of but not both of the following
> bus->shutdown() or driver->shutdown(). Even tpm_class_shutdown() mentions in the
> function comment that bus/device shutdown to follow.

I suppose you are referring to this here:

/**
  * tpm_class_shutdown() - prepare the TPM device for loss of power.
  * @dev: device to which the chip is associated.
  *
  * Issues a TPM2_Shutdown command prior to loss of power, as required 
by the
  * TPM 2.0 spec. Then, calls bus- and device- specific shutdown code.
  *
  * Return: always 0 (i.e. success)
  */

I think this comment still refers to the ancient code here:

https://elixir.bootlin.com/linux/v4.4.295/source/drivers/char/tpm/tpm-chip.c#L161

     if (dev->bus && dev->bus->shutdown)
         dev->bus->shutdown(dev);
     else if (dev->driver && dev->driver->shutdown)
         dev->driver->shutdown(dev);



>> Fixes: 39d0099f9439 ("powerpc/pseries: Add shutdown() to vio_driver and vio_bus")
> This patch left implementing each vio driver shutdown routine as an exercise for
> the respective maintainers, and instead just big hammers anything that doesn't
> have a shutdown routine by calling the driver->remove().
>
> If tpm_class_shutdown() quiecses ibmvtpm enough implementing a no-op
> ibmvtpm->shutdown() with a comment saying so suffices.
>
> However, the generic TPM code is still introducing a bug that an attempt to call
> tpm_chip_unregister() after tpm_class_shutdown() will crash as mentioned above.


An alternative solution may be this here:

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb7e109..4cb908349b31 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -473,15 +473,8 @@ static void tpm_del_char_device(struct tpm_chip *chip)
         mutex_unlock(&idr_lock);

         /* Make the driver uncallable. */
-       down_write(&chip->ops_sem);
-       if (chip->flags & TPM_CHIP_FLAG_TPM2) {
-               if (!tpm_chip_start(chip)) {
-                       tpm2_shutdown(chip, TPM2_SU_CLEAR);
-                       tpm_chip_stop(chip);
-               }
-       }
-       chip->ops = NULL;
-       up_write(&chip->ops_sem);
+       if (chip->ops)
+               tpm_class_shutdown(&chip->dev);
  }

  static void tpm_del_legacy_sysfs(struct tpm_chip *chip)


I could post this as v2 ?! Let me know...

    Stefan


>> Signed-off-by: Stefan Berger <stefanb@...ux.ibm.com>
>> ---
>>   drivers/char/tpm/tpm-chip.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
>> index ddaeceb7e109..cca1bde296ee 100644
>> --- a/drivers/char/tpm/tpm-chip.c
>> +++ b/drivers/char/tpm/tpm-chip.c
>> @@ -101,6 +101,9 @@ int tpm_chip_start(struct tpm_chip *chip)
>>   {
>>   	int ret;
>>
>> +	if (!chip->ops)
>> +		return -EINVAL;
>> +
> I wonder if a better fix would to have tpm_del_char_device() check for valid
> chip->ops and call tpm_class_shutdown() when the ops are still valid. Calling
> tpm_class_shutdown() allows for some code deduplication in tpm_del_char_device().
>
> -Tyrel
>
>>   	tpm_clk_enable(chip);
>>
>>   	if (chip->locality == -1) {
>>