lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgvyahW4QemhmD_xD9DVTzkPnhTNid7m2jgwJvjKL+u5g@mail.gmail.com>
Date: Tue, 4 Mar 2025 08:28:35 -1000
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Mateusz Guzik <mjguzik@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	"Sapkal, Swapnil" <swapnil.sapkal@....com>, Manfred Spraul <manfred@...orfullife.com>, 
	Christian Brauner <brauner@...nel.org>, David Howells <dhowells@...hat.com>, 
	WangYuli <wangyuli@...ontech.com>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, 
	"Shenoy, Gautham Ranjal" <gautham.shenoy@....com>, Neeraj.Upadhyay@....com, Ananth.narayan@....com
Subject: Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full

On Tue, 4 Mar 2025 at 02:55, Oleg Nesterov <oleg@...hat.com> wrote:
>
> I thought this is wrong, but then I noticed that in your version
> ->head_tail is the 1st member in this union.

Yes. That was intentional, to make the code have much less extraneous noise.

The only reason to ever use that "union pipe_index" is for this whole
"one word for everything", so I feel that making it compact is
actually more legible than adding extra markers.

> > + * Really only alpha needs 32-bit fields, but
> > + * might as well do it for 64-bit architectures
> > + * since that's what we've historically done,
> > + * and it makes 'head_tail' always be a simple
> > + * 'unsigned long'.
> > + */
> > +#ifdef CONFIG_64BIT
> > +  typedef unsigned int pipe_index_t;
> > +#else
> > +  typedef unsigned short pipe_index_t;
> > +#endif
>
> I am just curious, why we can't use "unsigned short" unconditionally
> and avoid #ifdef ?
>
> Is "unsigned int" more efficient on 64-bit?

The main reason is that a "unsigned short" write on alpha isn't atomic
- it's a read-modify-write operation, and so it isn't safe to mix

        spin_lock_irq(&pipe->rd_wait.lock);
         ...
        pipe->tail = ++tail;
        ...
        spin_unlock_irq(&pipe->rd_wait.lock);

with

         mutex_lock(&pipe->mutex);
         ...
         pipe->head = head + 1;
         ...
         mutex_unlock(&pipe->mutex);

 because while they are two different fields using two different
locks, on alpha the above only works if they are in separate words
(because updating one will do a read-and-write-back of the other).

This is a fundamental alpha architecture bug. I was actually quite
ready to just kill off alpha support entirely, because it's a dead
architecture that is unfixably broken. But there's some crazy patches
to make gcc generate horrific atomic code to make this all work on
alpha by Maciej Rozycki, so one day we'll be in the situation that
alpha can be considered "fixed", but we're not there yet.

Do we really care? Maybe not. The race is probably very hard to hit,
so with the two remaining alpha users, we could just say "let's just
make it use 16-bit ops".

But even on x86, 32-bit ops potentially generate just slightly better
code due to lack of some prefix bytes.

And those fields *used* to be 32-bit, so my patch basically kept the
status quo on 64-bit machines (and just turned it into 16-bit fields
on 32-bit architectures).

Anyway, I wouldn't object to just unconditionally making it "two
16-bit indexes make a 32-bit head_tail" if it actually makes the
structure smaller. It might not even matter on 64-bit because of
alignment of fields around it - I didn't check. As mentioned, it was
more of a combination of "alpha" plus "no change to relevant other
architectures".

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ