lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ef8c3ae-b365-03ba-0e8e-9f4a2bf53748@netscape.net>
Date:   Mon, 18 Jul 2022 07:32:20 -0400
From:   Chuck Zmudzinski <brchuckz@...scape.net>
To:     Thorsten Leemhuis <regressions@...mhuis.info>,
        Juergen Gross <jgross@...e.com>,
        xen-devel@...ts.xenproject.org, x86@...nel.org,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        Borislav Petkov <bp@...en8.de>
Cc:     jbeulich@...e.com, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        "# 5 . 17" <stable@...r.kernel.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Pavel Machek <pavel@....cz>, Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [PATCH 0/3] x86: make pat and mtrr independent from each other

On 7/17/2022 3:55 AM, Thorsten Leemhuis wrote:
> Hi Juergen!
>
> On 15.07.22 16:25, Juergen Gross wrote:
> > Today PAT can't be used without MTRR being available, unless MTRR is at
> > least configured via CONFIG_MTRR and the system is running as Xen PV
> > guest. In this case PAT is automatically available via the hypervisor,
> > but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> > 
> > As an additional complexity the availability of PAT can't be queried
> > via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
> > to be disabled. This leads to some drivers believing that not all cache
> > modes are available, resulting in failures or degraded functionality.
> > 
> > The same applies to a kernel built with no MTRR support: it won't
> > allow to use the PAT MSR, even if there is no technical reason for
> > that, other than setting up PAT on all cpus the same way (which is a
> > requirement of the processor's cache management) is relying on some
> > MTRR specific code.
> > 
> > Fix all of that by:
> > 
> > - moving the function needed by PAT from MTRR specific code one level
> >   up
> > - adding a PAT indirection layer supporting the 3 cases "no or disabled
> >   PAT", "PAT under kernel control", and "PAT under Xen control"
> > - removing the dependency of PAT on MTRR
>
> Thx for working on this. If you need to respin these patches for one
> reason or another, could you do me a favor and add proper 'Link:' tags
> pointing to all reports about this issue? e.g. like this:
>
>  Link: https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> These tags are considered important by Linus[1] and others, as they
> allow anyone to look into the backstory weeks or years from now. That is
> why they should be placed in cases like this, as
> Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst explain in more detail. I care
> personally, because these tags make my regression tracking efforts a
> whole lot easier, as they allow my tracking bot 'regzbot' to
> automatically connect reports with patches posted or committed to fix
> tracked regressions.
>
> [1] see for example:
> https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
> https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>

I echo Thorsten's thx for starting on this now instead of waiting until
September which I think is when Juergen said he could start working
on this last week. I agree with Thorsten that Link tags are needed.
Since multiple patches have been proposed to fix this regression,
perhaps a Link to each proposed patch, and a note that
the original report identified a specific commit which when reverted
also fixes it. IMO, this is all part of the backstory Thorsten refers to.

It looks like with this approach, a fix will not be coming real soon,
and Borislav Petkov also discouraged me from testing this
patch set until I receive a ping telling me it is ready for testing,
which seems to confirm that this regression will not be fixed
very soon. Please correct me if I am wrong about how long
it will take to fix it with this approach.

Also, is there any guarantee this approach is endorsed by
all the maintainers who will need to sign-off, especially
Linus? I say this because some of the discussion on the
earlier proposed patches makes me doubt this. I am especially
referring to this discussion:

https://lore.kernel.org/lkml/4c8c9d4c-1c6b-8e9f-fa47-918a64898a28@leemhuis.info/

and also, here:

https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/

where Borislav Petkov argues that Linux should not be
patched at all to fix this regression but instead the fix
should come by patching the Xen hypervisor.

So I have several questions, presuming at least the fix is going
to be delayed for some time, and also presuming this approach
is not yet an approach that has the blessing of the maintainers
who will need to sign-off:

1. Can you estimate when the patch series will be ready for
testing and suitable for a prepatch or RC release?

2. Can you estimate when the patch series will be ready to be
merged into the mainline release? Is there any hope it will be
fixed before the next longterm release hosted on kernel.org?

3. Since a fix is likely not coming soon, can you explain
why the commit that was mentioned in the original
report cannot be reverted as a temporary solution while
we wait for the full fix to come later? I can say that
reverting that commit (It was a commit affecting
drm/i915) does fix the issue on my system with no
negative side effects at all. In such a case, it seems
contrary to Linus' regression rule to not revert the
offending commit, even if reverting the offending
commit is not going to be the final solution. IOW,
I am trying to argue that an important corollary to
the Linus regression rule is that we revert commits
that introduce regressions, especially when there
are no negative effects when reverting the offending
commit. Why are we not doing that in this case?

4. Can you explain why this patch series is superior
to the other proposed patches that are much more
simple and have been reported to fix the regression?

5. This approach seems way too aggressive for backporting
to the stable releases. Is that correct? Or, will the patches
be backported to the stable releases? I was told that
backports to the stable releases are needed to keep things
consistent across all the supported versions when I submitted
a patch to fix this regression that identified a specific five year
old commit that my proposed patch would fix.

Remember, this is a regression that is really bothering
people now. For example, I am now in a position where
I cannot install the updates of the Linux kernel that Debian
pushes out to me without patching the kernel with my
own private build that has one of the known fixes that
have already been identified as ways to workaround this
regression while we wait for the full solution that will
hopefully come later.

Chuck

> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
>
> BTW, let me tell regzbot to monitor this thread:
>
> #regzbot ^backmonitor:
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ