lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201125234654.GN643756@sasha-vm>
Date:   Wed, 25 Nov 2020 18:46:54 -0500
From:   Sasha Levin <sashal@...nel.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     linux-kernel@...r.kernel.org, stable@...r.kernel.org,
        Dave Chinner <dchinner@...hat.com>,
        Jens Axboe <axboe@...nel.dk>,
        "Darrick J . Wong" <darrick.wong@...cle.com>,
        linux-xfs@...r.kernel.org
Subject: Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across
 extent boundaries

On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote:
>We've already had one XFS upstream kernel regression in this -rc
>cycle propagated to the stable kernels in 5.9.9 because the stable
>process picked up a bunch of random XFS fixes within hours of them
>being merged by Linus. One of those commits was a result of a
>thinko, and despite the fact we found it and reverted it within a
>few days, users of stable kernels have been exposed to it for a
>couple of weeks. That *should never have happened*.

No, what shouldn't have happened is a commit that never went out for a review
on the public mailing lists nor spending any time in linux-next ending
up in Linus's tree.

It's ridiculous that you see a failure in the maintainership workflow of
XFS and turn around to blame it somehow on the stable process.

>This has happened before, and *again* we were lucky this wasn't
>worse than it was. We were saved by the flaw being caught by own
>internal pre-write corruption verifiers (which exist because we
>don't trust our code to be bug-free, let alone the collections of
>random, poorly tested backports) so that it only resulted in
>corruption shutdowns rather than permanent on-disk damage and data
>loss.
>
>Put simply: the stable process is flawed because it shortcuts the
>necessary stabilisation testing for new code. It doesn't matter if

The stable process assumes that commits that ended up upstream were
reviewed and tested; the stable process doesn't offer much in the way of
in-depth review of specific patches but mostly focuses on testing the
product of backporting hundreds of patches into each stable branch.

Release candidate cycles are here to squash the bugs that went in during
the merge window, not to introduce new "thinkos" in the way of pulling
patches out of your hip in the middle of the release cycle.

>the merged commits have a "fixes" tag in them, that tag doesn't mean
>the change is ready to be exposed to production systems. We need the
>*-rc stabilisation process* to weed out thinkos, brown paper bag
>bugs, etc, because we all make mistakes, and bugs in filesystem code
>can *lose user data permanently*.

What needed to happen here is that XFS's internal testing story would
run *before* this patch was merged anywhere and catch this bug. Why
didn't it happen?

>Hence I ask that the stable maintainers only do automated pulls of
>iomap and XFS changes from upstream kernels when Linus officially
>releases them rather than at random points in time in the -rc cycle.
>If there is a critical fix we need to go back to stable kernels
>immediately, we will let stable@...nel.org know directly that we
>want this done.

I'll happily switch back to a model where we look only for stable tags
from XFS, but sadly this happened only *once* in the past year. How is
this helping to prevent the dangerous bugs that may cause users to lose
their data permanently?

-- 
Thanks,
Sasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ