linux-kernel - Re: [PATCH 08/11] x86: document X86_INTEL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3b35b432-8e9e-4499-9beb-25f4f7821572@app.fastmail.com>
Date: Fri, 06 Dec 2024 15:27:10 +0100
From: "Arnd Bergmann" <arnd@...db.de>
To: "Ferry Toth" <fntoth@...il.com>,
 "Andy Shevchenko" <andy.shevchenko@...il.com>,
 "Arnd Bergmann" <arnd@...nel.org>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
 "Thomas Gleixner" <tglx@...utronix.de>, "Ingo Molnar" <mingo@...hat.com>,
 "Borislav Petkov" <bp@...en8.de>,
 "Dave Hansen" <dave.hansen@...ux.intel.com>,
 "H. Peter Anvin" <hpa@...or.com>,
 "Linus Torvalds" <torvalds@...ux-foundation.org>,
 "Andy Shevchenko" <andy@...nel.org>, "Matthew Wilcox" <willy@...radead.org>,
 "Sean Christopherson" <seanjc@...gle.com>,
 "Davide Ciminaghi" <ciminaghi@...dd.com>,
 "Paolo Bonzini" <pbonzini@...hat.com>, kvm@...r.kernel.org
Subject: Re: [PATCH 08/11] x86: document X86_INTEL_MID as 64-bit-only

On Fri, Dec 6, 2024, at 12:23, Ferry Toth wrote:
> Op 04-12-2024 om 19:55 schreef Andy Shevchenko:
>>
>> It's all other way around (from SW point of view). For unknown reasons
>> Intel decided to release only 32-bit SW and it became the only thing
>> that was heavily tested (despite misunderstanding by some developers
>> that pointed finger to the HW without researching the issue that
>> appears to be purely software in a few cases) _that_ time.  Starting
>> ca. 2017 I enabled 64-bit for Merrifield and from then it's being used
>> by both 32- and 64-bit builds.
>>
>> I'm totally fine to drop 32-bit defaults for Merrifield/Moorefield,
>> but let's hear Ferry who might/may still have a use case for that.
>
> Do to the design of SLM if found (and it is also documented in Intel's 
> HW documentation)
>
> that there is a penalty introduced when executing certain instructions 
> in 64b mode. The one I found
>
> is crc32di, running slower than 2 crc32si in series. Then there are 
> other instructions seem to runs faster in 64b mode.
>
> And there is of course the usual limited memory space than could benefit 
> for 32b mode. I never tried the mixed (x86_32?)
>
> mode. But I am building and testing both i686 and x86_64 for each Edison 
> image.

Hi Ferry,

Thanks a lot for the detailed reply, this is exactly the kind of
information I was hoping to get out of my series, in particular
since we have a lot of the same tradeoffs on low-end 64-bit
Arm platforms, and I've been trying to push users toward running
64-bit kernels on those.

I generally think that it makes a lot of sense to run 32-bit
userspace on memory limited devices, in particular with less
than 512MB, but it's often still useful on devices with 1GB.

Running a 32-bit kernel is usually not worth it if you can
avoid it, and with 1GB of RAM you definitely run into limits
either from using HIGHMEM (with CONFIG_VMSPLIT_3G) or in
user addressing (with any other VMPLIT_*), in addition to the
32-bit kernels just being less well maintained and missing
security features.

Using a 64-bit kernel with CONFIG_COMPAT for 32-bit userspace
tends to be the best combination for a large number of
embedded workloads. As a rough estimate on Arm hardware,
I found that a 64-bit kernel tends to use close to twice
the amount of RAM for itself (vmlinux, slab caches, page
tables, mem_map[]) compared to a 32-bit kernel, but this
should be no more than 10-20% of the total RAM for sensible
workloads as all the interesting bits happen in userland.
I expect the numbers to be similar for x86, but have not
looked in detail.

In userspace there is more variation depending on the type
of application: the base system has a similar 2x ratio, but
once you get into data intensive tasks (file server,
networking, image/video processing, ...) the overhead of
64-bit userspace is lower because the size of the actual
data is the same on both.

For the specific case of the crc32di instruction, I
suspect the in-kernel version of this can be trivially
changed like

diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c
index 52c5d47ef5a1..60b9b3cab679 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -60,10 +60,10 @@ static u32 __pure crc32c_intel_le_hw(u32 crc, unsigned char const *p, size_t len
 {
        unsigned int iquotient = len / SCALE_F;
        unsigned int iremainder = len % SCALE_F;
-       unsigned long *ptmp = (unsigned long *)p;
+       unsigned int *ptmp = (unsigned int *)p;
 
        while (iquotient--) {
-               asm(CRC32_INST
+               asm("crc32l %1, %0"
                    : "+r" (crc) : "rm" (*ptmp));
                ptmp++;
        }

to get you the faster version, plus some form of
configurability to make sure other CPUs still get the
crc32q version by default.

> I think that should at minimum be useful to catch 32b errors in the 
> kernel in certain areas (shared with other 32b
> archs. So, I would prefer 32b support for this platform to continue.

I can certainly see this both ways, on the one hand I do
care a lot about 32-bit Arm platforms and appreciate the help
in finding issues on 32-bit kernels. On the other hand I
really don't want anyone to waste time testing something that
should never be used in practice and keeping a feature in
the kernel only for the purpose of regression testing that
feature.

The platform is also special enough that I don't see
testing it in 32-bit mode as particularly helpful to
others, and it's unlikely to catch bugs that testing in
KVM won't.

Testing your 32-bit userland with a 64-bit kernel would be
helpful of course to ensure it keeps working for anyone
that had been using 32-bit kernel+userspace if we drop
32-bit kernel support for it.

One related idea that I've discussed before is to have
32-bit kernels refuse to boot on 64-bit hardware and
instead print the URL of a wiki page to explain all of
the above. There would probably have to be whitelist
of platforms that are buggy in 64-bit mode, and a command
line option to revert back to the previous behavior
to allow testing.

       Arnd