[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202310230907.C39FED1BC@keescook>
Date: Mon, 23 Oct 2023 09:08:09 -0700
From: Kees Cook <keescook@...omium.org>
To: Jan Kara <jack@...e.cz>
Cc: Andy Shevchenko <andy.shevchenko@...il.com>,
Kees Cook <kees@...nel.org>, Baokun Li <libaokun1@...wei.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Ferry Toth <ftoth@...londelft.nl>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [GIT PULL] ext2, quota, and udf fixes for 6.6-rc1
On Mon, Oct 23, 2023 at 02:15:01PM +0200, Jan Kara wrote:
> On Mon 23-10-23 14:45:05, Andy Shevchenko wrote:
> > On Sat, Oct 21, 2023 at 04:36:19PM -0700, Kees Cook wrote:
> > > On October 20, 2023 1:36:36 PM PDT, andy.shevchenko@...il.com wrote:
> > > >That said, if you or anyone has ideas how to debug futher, I'm all ears!
> > >
> > > I don't think this has been tried yet:
> > >
> > > When I've had these kind of hard-to-find glitches I've used manual
> > > built-binary bisection. Assuming you have a source tree that works when built
> > > with Clang and not with GCC:
> > > - build the tree with Clang with, say, O=build-clang
> > > - build the tree with GCC, O=build-gcc
> > > - make a new tree for testing: cp -a build-clang build-test
> > > - pick a suspect .o file (or files) to copy from build-gcc into build-test
> > > - perform a relink: "make O=build-test" should DTRT since the copied-in .o
> > > files should be newer than the .a and other targets
> > > - test for failure, repeat
> > >
> > > Once you've isolated it to (hopefully) a single .o file, then comes the
> > > byte-by-byte analysis or something similar...
> > >
> > > I hope that helps! These kinds of bugs are super frustrating.
> >
> > I'm sorry, but I can't see how this is not an error prone approach.
> > If it's a timing issue then the arbitrary object change may help and it doesn't
> > prove anything. As earlier I tried to comment out the error message, and it
> > worked with GCC as well. The difference is so little (according to Linus) that
> > it may not be suspectible. Maybe I am missing the point...
>
> Given how reliably you can hit the problem with some kernels while you
> cannot hit them with others (only slightly different in a code that doesn't
> even get executed on your system) I suspect this is really more a code
> placement issue than a timing issue. Like if during the linking phase of
> vmlinux some code ends up at some position, the kernel fails, otherwise it
> boots fine. Not sure how to debug such thing though. Maybe some playing
> with the linker and the order of object files linked could reveal something
> but I'm just guessing.
Right -- in theory there will be some minimum subset of "from GCC"
objects that when used together in the otherwise "known good" build will
trip the failure.
--
Kees Cook
Powered by blists - more mailing lists