[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231023121501.ae3ig3hzxqycglyt@quack3>
Date: Mon, 23 Oct 2023 14:15:01 +0200
From: Jan Kara <jack@...e.cz>
To: Andy Shevchenko <andy.shevchenko@...il.com>
Cc: Kees Cook <kees@...nel.org>, Jan Kara <jack@...e.cz>,
Baokun Li <libaokun1@...wei.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Kees Cook <keescook@...omium.org>,
Ferry Toth <ftoth@...londelft.nl>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [GIT PULL] ext2, quota, and udf fixes for 6.6-rc1
On Mon 23-10-23 14:45:05, Andy Shevchenko wrote:
> On Sat, Oct 21, 2023 at 04:36:19PM -0700, Kees Cook wrote:
> > On October 20, 2023 1:36:36 PM PDT, andy.shevchenko@...il.com wrote:
> > >That said, if you or anyone has ideas how to debug futher, I'm all ears!
> >
> > I don't think this has been tried yet:
> >
> > When I've had these kind of hard-to-find glitches I've used manual
> > built-binary bisection. Assuming you have a source tree that works when built
> > with Clang and not with GCC:
> > - build the tree with Clang with, say, O=build-clang
> > - build the tree with GCC, O=build-gcc
> > - make a new tree for testing: cp -a build-clang build-test
> > - pick a suspect .o file (or files) to copy from build-gcc into build-test
> > - perform a relink: "make O=build-test" should DTRT since the copied-in .o
> > files should be newer than the .a and other targets
> > - test for failure, repeat
> >
> > Once you've isolated it to (hopefully) a single .o file, then comes the
> > byte-by-byte analysis or something similar...
> >
> > I hope that helps! These kinds of bugs are super frustrating.
>
> I'm sorry, but I can't see how this is not an error prone approach.
> If it's a timing issue then the arbitrary object change may help and it doesn't
> prove anything. As earlier I tried to comment out the error message, and it
> worked with GCC as well. The difference is so little (according to Linus) that
> it may not be suspectible. Maybe I am missing the point...
Given how reliably you can hit the problem with some kernels while you
cannot hit them with others (only slightly different in a code that doesn't
even get executed on your system) I suspect this is really more a code
placement issue than a timing issue. Like if during the linking phase of
vmlinux some code ends up at some position, the kernel fails, otherwise it
boots fine. Not sure how to debug such thing though. Maybe some playing
with the linker and the order of object files linked could reveal something
but I'm just guessing.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists