lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Oct 2023 20:05:47 +0200
From:   Jan Kara <jack@...e.cz>
To:     Andy Shevchenko <andriy.shevchenko@...el.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...nel.org>, Jan Kara <jack@...e.cz>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Kees Cook <keescook@...omium.org>,
        Ferry Toth <ftoth@...londelft.nl>,
        linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [GIT PULL] ext2, quota, and udf fixes for 6.6-rc1

On Fri 20-10-23 17:51:59, Andy Shevchenko wrote:
> On Thu, Oct 19, 2023 at 11:43:47AM -0700, Linus Torvalds wrote:
> > On Thu, 19 Oct 2023 at 11:16, Andy Shevchenko
> > <andriy.shevchenko@...el.com> wrote:
> > >
> > > Meanwhile a wild idea, can it be some git (automatic) conflict resolution that
> > > makes that merge affect another (not related to the main contents of the merge)
> > > files? Like upstream has one base, the merge has another which is older/newer
> > > in the history?
> > 
> > I already looked at any obvious case of that.
> > 
> > The only quota-related issue on the other side is an obvious
> > one-liner: commit 86be6b8bd834 ("quota: Check presence of quota
> > operation structures instead of ->quota_read and ->quota_write
> > callbacks").
> > 
> > It didn't affect the merge, because it was not related to  any of the
> > changes that came in from the quota branch (it was physically close to
> > the change that used lockdep_assert_held_write() instead of a
> > WARN_ON_ONCE(down_read_trylock()) sequence, but it is unrelated to
> > it).
> > 
> > I guess you could try reverting that one-liner after the merge, but I
> > _really_ don't think it is at all relevant.
> > 
> > What *would* probably be interesting is to start at the pre-merge
> > state, and rebase the code that got merged in. And then bisect *that*
> > series.
> > 
> > IOW, with the merge that triggers your bisection being commit
> > 1500e7e0726e, do perhaps something like this:
> > 
> >   # Name the states before the merge
> >   git branch pre-merge 1500e7e0726e^
> >   git branch jan-state 1500e7e0726e^2
> > 
> >   # Now double-check that this works for you, of course.
> >   # Just to be safe...
> >   git checkout pre-merge
> >   .. test-build and test-boot this with the bad config ..
> 
> That's I have checked already [4], but okay, let me double check.
> [5] is the same as [4] according to `git diff`.
> 
> It boots.
> 
> >   # Then, let's create a new branch that is
> >   # the rebased version of Jan's state:
> >   git checkout -b jan-rebased jan-state
> >   git rebase pre-merge
> 
> [6] is created.
> 
> >   # Verify that the tree is the same as the merge
> >   git diff 1500e7e0726e
> 
> Yes, nothing in output.
> 
> And it does not boot.
> 
> >   # Ok, that was empty, so do a bisect on this
> >   # rebased history
> >   git bisect start
> >   git bisect bad
> >   git bisect good pre-merge
> > 
> > .. and see what commit it *now* claims is the bad commit.
> 
> git bisect start
> # status: waiting for both good and bad commits
> # good: [63580f669d7ff5aa5a1fa2e3994114770a491722] Merge tag 'ovl-update-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
> git bisect good 63580f669d7ff5aa5a1fa2e3994114770a491722
> # status: waiting for bad commit, 1 good commit known
> # bad: [2447ff4196091e41d385635f9b6d003119f24199] ext2: Fix kernel-doc warnings
> git bisect bad 2447ff4196091e41d385635f9b6d003119f24199
> # bad: [a7c4109a1fa7f9f8cfa9aa93e7aae52d0df820f6] MAINTAINERS: change reiserfs status to obsolete
> git bisect bad a7c4109a1fa7f9f8cfa9aa93e7aae52d0df820f6
> # bad: [74fdc82e4a4302bf8a519101a40691b78d9beb6c] quota: add new helper dquot_active()
> git bisect bad 74fdc82e4a4302bf8a519101a40691b78d9beb6c
> # bad: [e64db1c50eb5d3be2187b56d32ec39e56b739845] quota: factor out dquot_write_dquot()
> git bisect bad e64db1c50eb5d3be2187b56d32ec39e56b739845
> # good: [eea7e964642984753768ddbb710e2eefd32c7a89] ext2: remove redundant assignment to variable desc and variable best_desc
> git bisect good eea7e964642984753768ddbb710e2eefd32c7a89
> # first bad commit: [e64db1c50eb5d3be2187b56d32ec39e56b739845] quota: factor out dquot_write_dquot()

Interesting, but it's a data point :)

> > Would you be willing to do this? It should be only a few bisects,
> > since Jan's branch only brought in 17 commits that the above rebases
> > into this test branch. So four or five bisections should pinpoint the
> > exact point where it goes bad.
> 
> See above.
> 
> I even rebuilt again with just rebased on top of e64db1c50eb5 and it doesn't
> boot, so we found the culprit that triggers this issue.
> 
> > Of course, since this is apparently about some "random code generation
> > issue", that exact point still may not be very interesting.
> 
> On top of the above I have tried the following:
> 1) dropping inline, replacing it to __always_inline -- no help;
> 2) commenting out the error message -- helps!
> 
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -632,8 +632,10 @@ static inline int dquot_write_dquot(struct dquot *dquot)
>  {
>         int ret = dquot->dq_sb->dq_op->write_dquot(dquot);
>         if (ret < 0) {
> +#if 0
>                 quota_error(dquot->dq_sb, "Can't write quota structure "
>                             "(error %d). Quota may get out of sync!", ret);
> +#endif
>                 /* Clear dirty bit anyway to avoid infinite loop. */
>                 clear_dquot_dirty(dquot);
>         }
> 
> If it's a timing issue it's related to that error message, as the new helper is
> run outside of the spinlock.
> 
> What's is fishy there besides the error message being available only in one
> case, is the pointer that is used for dp_op. I'm not at all familiar with the
> code, but can it be that these superblocks are different for those two cases?

dquot->dq_sb could be different from the sb we have only if the dirty list
would get corrupted in some way. Not impossible but does not seem too
likely. Let's first check whether there are any quotas in the first place.

I've asked this already but I don't think I've got an answer: What
filesystem type is the root filesystem? Does it have any quotas (either
there would be files like aquota.user, quota.user, aquota.group,
quota.group in / or there would be quota feature enabled - how to find that
out depends on the filesystem so once I know the fs type I can give you
instructions).

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists