linux-kernel - Re: [PATCH] crypto: padlock-sha - Add support for Zhaoxin processor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0575ec9d-a6b8-4932-a1aa-9646813957a2@zhaoxin.com>
Date: Mon, 16 Jun 2025 20:23:36 +0800
From: AlanSong-oc <AlanSong-oc@...oxin.com>
To: Herbert Xu <herbert@...dor.apana.org.au>
CC: <davem@...emloft.net>, <linux-crypto@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <CobeChen@...oxin.com>,
	<TonyWWang-oc@...oxin.com>, <YunShen@...oxin.com>, <GeorgeXue@...oxin.com>,
	<LeoLiu-oc@...oxin.com>, <HansHu@...oxin.com>
Subject: Re: [PATCH] crypto: padlock-sha - Add support for Zhaoxin processor


On 6/12/2025 1:05 PM, Herbert Xu wrote:
> 
> On Wed, Jun 11, 2025 at 06:17:50PM +0800, AlanSong-oc wrote:
>>
>> +static int padlock_sha1_update_zhaoxin(struct shash_desc *desc,
>> +                                   const u8 *src, unsigned int len)
>> +{
>> +       struct sha1_state *state = padlock_shash_desc_ctx(desc);
>> +       int blocks = len / SHA1_BLOCK_SIZE;
>> +
>> +       /* The xsha1 instruction requires a 32-byte buffer for execution for Zhaoxin processors */
>> +       u8 buf[32 + PADLOCK_ALIGNMENT - 1];
>> +       u8 *dst = PTR_ALIGN(&buf[0], PADLOCK_ALIGNMENT);
> 
> The padlock has always had an alignment requirement.  We already
> deal with this by using PADLOCK_ALIGNMENT.  So rather than re-inventing
> it here, you should simply change PADLOCK_ALIGNMENT to 32 for Zhaoxin.

For the Zhaoxin processor, the XSHA1 instruction requires 16-byte
alignment, as specified by PADLOCK_ALIGNMENT, rather than 32 bytes,
which remains unchanged. Instead, it requires a 32-byte output buffer
for instruction execution. Before execution, the first 20 bytes of the
output buffer must be initialized with the SHA-1 initial hash constants.
After execution, the first 20 bytes will contain the computed hash
result, while the remaining 12 bytes will be zeroed out. Explain it
using a graph as shown below:

# Before XSHA1 Execution on Zhaoxin platform
Offset:     0                      19                       31
             +----------------------+------------------------+
Buffer:     | Initial Hash Values  |        xxxxxx          |
             +----------------------+------------------------+

# After XSHA1 Execution on Zhaoxin platform
Offset:     0                      19                       31
             +----------------------+------------------------+
Buffer:     |     Hash Result      |        Zeroed          |
             +----------------------+------------------------+

> You should also fix the comment to state that 32 is for alignment
> rather than the size.  The Nano already requires an 128-byte buffer
> and we cater for that so it can't be the size that's the problem
> here.

The 128-byte buffer requirement is already included in 'descsize',
as defined by PADLOCK_SHA_DESCSIZE. In the previous version of
the padlock-sha driver, the 'struct sha1_state' variable and the buffer
resided in separate memory regions. It allowed the driver to safely
write initial hash constants into the buffer and retrieve hash results
from buffer through memcpy() operations. Crucially, when the XSHA1
instruction zeroed out the tail bytes of the buffer, it cannot affect
the contents of 'struct sha1_state'. However, in the current driver
implementation, the 'struct sha1_state' shares memory space with the
buffer. Consequently, when the XSHA1 instruction executes, it
inadvertently clears other members of 'struct sha1_state'. Specifically,
when padlock_sha1_finup() is called, the 'count' member of
'struct sha1_state' no longer reflects the actual data length processed.
Explain it using a graph as shown below:

# Previous version of driver
Offset:         0               19     27                     91
                 +---------------+-------+---------------------+
sha1_state:     |     state     | count |      buffer         |
                 +---------------+-------+---------------------+
                    |         ^
           1. Write |         | 2. Retrieve
                    |         |
Offset:         0  v         |  19                                 127
                 +---------------+----------------------------------+
Buffer:         |               |                                  |
                 +---------------+----------------------------------+

# Current version of driver
Offset:         0                                                  127
                 +--------------------------------------------------+
                 |               19      27 31                 91   |
                 |+--------------+-------+---------------------+    |
Buffer:         ||     state    | count |      buffer         |    |
                 |+--------------+-------+---------------------+    |
                 |                **********                        |
                 +--------------------------------------------------+
                 *: will cleared by instruction

Best Regards
AlanSong-oc