linux-kernel - Re: [GIT PULL] alpha: cleanups and build fixes for 6.10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <99765904-3f35-4c78-998e-b444a6ab90e4@gmail.com>
Date: Mon, 13 May 2024 12:50:07 +0900
From: Akira Yokosawa <akiyks@...il.com>
To: paulmck@...nel.org
Cc: arnd@...db.de, glaubitz@...sik.fu-berlin.de, ink@...assic.park.msu.ru,
 linux-alpha@...r.kernel.org, linux-arch@...r.kernel.org,
 linux-kernel@...r.kernel.org, mattst88@...il.com,
 richard.henderson@...aro.org, torvalds@...ux-foundation.org,
 viro@...iv.linux.org.uk, Ulrich Teichert <krypton@...ich-teichert.org>,
 Akira Yokosawa <akiyks@...il.com>
Subject: Re: [GIT PULL] alpha: cleanups and build fixes for 6.10

On Sun, 12 May 2024 07:44:25 -0700, Paul E. McKenney wrote:
> On Sun, May 12, 2024 at 08:02:59AM +0200, John Paul Adrian Glaubitz wrote:
>> On Sat, 2024-05-11 at 18:26 -0700, Paul E. McKenney wrote:
>> > And that breaks things because it can clobber concurrent stores to
>> > other bytes in that enclosing machine word.
>> 
>> But pre-EV56 Alpha has always been like this. What makes it broken
>> all of a sudden?
> 
> I doubt if it was sudden.   Putting concurrently (but rarely) accessed
> small-value quantities into single bytes is a very natural thing to do,
> and I bet that there are quite a few places in the kernel where exactly
> this happens.  I happen to know of a specific instance that went into
> mainline about two years ago.
> 
> So why didn't the people running current mainline on pre-EV56 Alpha
> systems notice?  One possibility is that they are upgrading their
> kernels only occasionally.  Another possibility is that they are seeing
> the failures, but are not tracing the obtuse failure modes back to the
> change(s) in question.  Yet another possibility is that the resulting
> failures are very low probability, with mean times to failure that are
> so long that you won't notice anything on a single system.

Another possibility is that the Jensen system was booted into uni processer
mode.  Looking at the early boot log [1] provided by Ulrich (+CCed) back in
Sept. 2021, I see the following by running "grep -i cpu":

>> > [1] https://marc.info/?l=linux-alpha&m=163265555616841&w=2

[    0.000000] Memory: 90256K/131072K available (8897K kernel code, 9499K rwdata, \
2704K rodata, 312K init, 437K bss, 40816K reserved, 0K cma-reserved) [    0.000000] \
random: get_random_u64 called from __kmem_cache_create+0x54/0x600 with crng_init=0 [  \
0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [    0.000000]
                                                     ^^^^^^

Without any concurrent atomic updates, the "broken" atomic accesses won't
matter, I guess.

        Thanks, Akira