[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV46vxPWEKs7_eW5@foz.lan>
Date: Wed, 7 Jan 2026 11:54:20 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
Cc: duchangbin <changbin.du@...wei.com>, Jonathan Corbet <corbet@....net>,
Mauro Carvalho Chehab <mchehab@...nel.org>, "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] tools: jobserver: Add validation for jobserver tokens to
ensure valid '+' characters
On Wed, Jan 07, 2026 at 11:42:38AM +0100, Mauro Carvalho Chehab wrote:
> On Wed, Jan 07, 2026 at 10:29:10AM +0100, Mauro Carvalho Chehab wrote:
> > Em Wed, 7 Jan 2026 08:11:29 +0000
> > duchangbin <changbin.du@...wei.com> escreveu:
> >
> > > On Tue, Jan 06, 2026 at 02:52:06PM -0700, Jonathan Corbet wrote:
> > > > Changbin Du <changbin.du@...wei.com> writes:
> > > >
> > > > > Add validation for jobserver tokens to prevent infinite loops on invalid fds
> > > > > When using GNU Make's jobserver feature in kernel builds, a bug in MAKEFLAGS
> > > > > propagation caused "--jobserver-auth=3,4" to reference an unintended file
> > > > > descriptor (Here, fd 3 was inherited from a shell command that opened
> > > > > "/etc/passwd" instead of a valid pipe). This led to infinite loops in
> > > > > jobserver-exec's os.read() calls due to empty or corrupted tokens. (The
> > > > > version of my make is 4.3)
> > > > >
> > > > > $ ls -l /proc/self/fd
> > > > > total 0
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 0 -> /dev/pts/1
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 1 -> /dev/pts/1
> > > > > lrwx------ 1 changbin changbin 64 Dec 25 13:03 2 -> /dev/pts/1
> > > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 3 -> /etc/passwd
> > > > > lr-x------ 1 changbin changbin 64 Dec 25 13:03 4 -> /proc/1421383/fd
> > > > >
> > > > > The modified code now explicitly validates tokens:
> > > > > 1. Rejects empty reads (prevents infinite loops on EOF)
> > > > > 2. Checks all bytes are '+' characters (catches fd reuse issues)
> > > > > 3. Raises ValueError with clear diagnostics for debugging
> > > > > This ensures robustness against invalid jobserver configurations, even when
> > > > > external tools (like make) incorrectly pass non-pipe file descriptors.
> > > > >
> > > > > Signed-off-by: Changbin Du <changbin.du@...wei.com>
> > > > > ---
> > > > > tools/lib/python/jobserver.py | 2 ++
> > > > > 1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/python/jobserver.py b/tools/lib/python/jobserver.py
> > > > > index a24f30ef4fa8..88d005f96bed 100755
> > > > > --- a/tools/lib/python/jobserver.py
> > > > > +++ b/tools/lib/python/jobserver.py
> > > > > @@ -91,6 +91,8 @@ class JobserverExec:
> > > > > while True:
> > > > > try:
> > > > > slot = os.read(self.reader, 8)
> > > > > + if not slot or any(c != b'+'[0] for c in slot):
> > > > > + raise ValueError("empty or unexpected token from jobserver")
> > > >
> > > > So I had to stare at this for a while to figure out what it was doing; a
> > > > comment might help.
> > > >
> > > > But if it finds something that's not b'+', it simply crashes the whole
> > > > thing? Is that really what we want to do? It would seem better to
> > > > proceed if we got any slots at all, and to emit a message telling the
> > > > poor user what they might want to do about the situation?
> > > >
> > > I suspect that in Make versions prior to 4.3, when generating the "--jobserver-auth=r,w"
> > > parameter, the implementation fails to properly handle situations where file descriptor 3
> > > is already occupied by the parent process (as I encountered where fd 3 was actually used to
> > > open /etc/passwd). This appears to force Make to always use fd3 regardless of its
> > > availability (I'm not sure how Make was written). In contrast, Make 4.4+ versions
> > > default to using named pipes, which avoids this issue entirely.
> >
> > It would be nice if you could provide more details about how to reproduce it.
> > Are you doing anything special? What distro are you using? what python version?
> >
> > > When this problem occurs, the current implementation deadlocks because for regular files,
> > > os.read() returns empty bytes after reaching EOF, creating an infinite loop. My workaround
> > > is to ignore this error condition to prevent deadlock, although this means the jobserver
> > > protocol will no longer be honored.
> >
> > testing if slot is empty makes sense, but why testing if it is "+"?
> >
> > >
> > > As you suggested above, We can output an error message to stderr to inform users, but
> > > must not use stdout, as it would corrupt the tool's normal output stream.
> >
>
> After thinking a little bit more about this, IMHO the best is to have
> two separate patches (assuming that there is a good reason why ensuring that the
> slot's character is "+"):
>
> > You could do something like (untested):
> >
> > while True:
> > try:
> > slot = os.read(self.reader, 8)
> > + if not slot:
> > + # Stop at the end of the jobserver queue.
> > + break
>
> This would be patch 1, to overcome some issue (probably due to Python
> version) that reading past EOF won't rise an exception. I would very much
> want to see what python version you're using and see if some other
> exception arose (like EOFError), properly described at the patch description.
Answering myself, EOFError is only for input() method:
https://docs.python.org/3/library/exceptions.html#EOFError
reading past EOF returns an empty string, so the above check is indeed
needed to avoid an endless loop.
>
> > + # Why do we need this?
> > + if any(c != b'+'[0] for c in slot):
> > + print("Warning: invalid jobserver slots", file=sys.stderr)
> > + break
>
> This seems to be a separate issue. Why do we need to enforce that the slot data
> is "+"? If it doesn't, why this would be a problem?
>
> Btw, reading:
>
> https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html
>
> We have:
>
> "In both implementations of the jobserver, the pipe will be pre-loaded with
> one single-character token for each available job. To obtain an extra slot
> you must read a single character from the jobserver; to release a slot you
> must write a single character back into the jobserver.
>
> It’s important that when you release the job slot, you write back the same
> character you read. Don’t assume that all tokens are the same character;
> different characters may have different meanings to GNU make. The order is
> not important, since make has no idea in what order jobs will complete anyway."
>
> So, a 100% compliant POSIX jobserver code shall not test for "+", but, instead,
> preserve whatever character is there.
>
> Yet, checking for "+" is really needed, please add a rationale at the patch
> description justifying why. On such case, we should still:
>
> - release the slot(s) we don't want by writing the character via
> os.write();
> - print a warning message about why we rejected the slot(s).
>
> > self.jobs += slot
> > except (OSError, IOError) as e:
> > if e.errno == errno.EWOULDBLOCK:
> > # Stop at the end of the jobserver queue.
> > break
> > # If something went wrong, give back the jobs.
> > if self.jobs:
> > os.write(self.writer, self.jobs)
> > raise e
> >
> > Yet, if os.read() fails or reaches EOF, I would expect that the "except" block
> > would pick it. It sounds to me that it could be some issue with the python
> > version you're using.
> >
> > > For
> > > example, in scripts/Makefile.vmlinux_o we have:
> > >
> > > quiet_cmd_gen_initcalls_lds = GEN $@
> > > cmd_gen_initcalls_lds = \
> > > $(PYTHON3) $(srctree)/scripts/jobserver-exec \
> > > $(PERL) $(real-prereqs) > $@
> > >
> > >
> > > > > self.jobs += slot
> > > > > except (OSError, IOError) as e:
> > > > > if e.errno == errno.EWOULDBLOCK:
> > > >
> > > > Thanks,
> > > >
> > > > jon
> > >
> >
> >
> >
> > Thanks,
> > Mauro
>
> --
> Thanks,
> Mauro
--
Thanks,
Mauro
Powered by blists - more mailing lists