lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ywf5ATPG7a/I0SLu@dev-arch.thelio-3990X>
Date:   Thu, 25 Aug 2022 15:34:41 -0700
From:   Nathan Chancellor <nathan@...nel.org>
To:     Harry Wentland <harry.wentland@....com>,
        "Siqueira, Rodrigo" <Rodrigo.Siqueira@....com>,
        "Pan, Xinhui" <Xinhui.Pan@....com>,
        Christian König <christian.koenig@....com>,
        Alex Deucher <alexander.deucher@....com>
Cc:     clang-built-linux <llvm@...ts.linux.dev>,
        David Airlie <airlied@...ux.ie>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Sudip Mukherjee (Codethink)" <sudipm.mukherjee@...il.com>,
        Arnd Bergmann <arnd@...nel.org>
Subject: Re: mainline build failure for x86_64 allmodconfig with clang

Hi AMD folks,

Top posting because it might not have been obvious but I was looking for
your feedback on this message (which can be viewed on lore.kernel.org if
you do not have the original [1]) so that we can try to get this fixed
in some way for 6.0/6.1. If my approach is not welcome, please consider
suggesting another one or looking to see if this is something you all
could look into.

[1]: https://lore.kernel.org/Yv5h0rb3AgTZLVJv@dev-arch.thelio-3990X/

Cheers,
Nathan

On Thu, Aug 18, 2022 at 08:59:14AM -0700, Nathan Chancellor wrote:
> Hi Arnd,
> 
> Doubling back around to this now since I think this is the only thing
> breaking x86_64 allmodconfig with clang 11 through 15.
> 
> On Fri, Aug 05, 2022 at 09:32:13PM +0200, Arnd Bergmann wrote:
> > On Fri, Aug 5, 2022 at 8:02 PM Nathan Chancellor <nathan@...nel.org> wrote:
> > > On Fri, Aug 05, 2022 at 06:16:45PM +0200, Arnd Bergmann wrote:
> > > > On Fri, Aug 5, 2022 at 5:32 PM Harry Wentland <harry.wentland@....com> wrote:
> > > > While splitting out sub-functions can help reduce the maximum stack
> > > > usage, it seems that in this case it makes the actual problem worse:
> > > > I see 2168 bytes for the combined
> > > > dml32_ModeSupportAndSystemConfigurationFull(), but marking
> > > > mode_support_configuration() as noinline gives me 1992 bytes
> > > > for the outer function plus 384 bytes for the inner one. So it does
> > > > avoid the warning (barely), but not the problem that the warning tries
> > > > to point out.
> > >
> > > I haven't had a chance to take a look at splitting things up yet, would
> > > you recommend a different approach?
> > 
> > Splitting up large functions can help when you have large local variables
> > that are used in different parts of the function, and the split gets the
> > compiler to reuse stack locations.
> > 
> > I think in this particular function, the problem isn't actually local variables
> > but either pushing variables on the stack for argument passing,
> > or something that causes the compiler to run out of registers so it
> > has to spill registers to the stack.
> > 
> > In either case, one has to actually look at the generated output
> > and then try to rearrange the codes so this does not happen.
> > 
> > One thing to try would be to condense a function call like
> > 
> >                 dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport(
> > 
> > &v->dummy_vars.dml32_CalculateWatermarksMALLUseAndDRAMSpeedChangeSupport,
> >                         mode_lib->vba.USRRetrainingRequiredFinal,
> >                         mode_lib->vba.UsesMALLForPStateChange,
> > 
> > mode_lib->vba.PrefetchModePerState[mode_lib->vba.VoltageLevel][mode_lib->vba.maxMpcComb],
> >                         mode_lib->vba.NumberOfActiveSurfaces,
> >                         mode_lib->vba.MaxLineBufferLines,
> >                         mode_lib->vba.LineBufferSizeFinal,
> >                         mode_lib->vba.WritebackInterfaceBufferSize,
> >                         mode_lib->vba.DCFCLK,
> >                         mode_lib->vba.ReturnBW,
> >                         mode_lib->vba.SynchronizeTimingsFinal,
> > 
> > mode_lib->vba.SynchronizeDRRDisplaysForUCLKPStateChangeFinal,
> >                         mode_lib->vba.DRRDisplay,
> >                         v->dpte_group_bytes,
> >                         v->meta_row_height,
> >                         v->meta_row_height_chroma,
> > 
> > v->dummy_vars.DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation.mmSOCParameters,
> >                         mode_lib->vba.WritebackChunkSize,
> >                         mode_lib->vba.SOCCLK,
> >                         v->DCFCLKDeepSleep,
> >                         mode_lib->vba.DETBufferSizeY,
> >                         mode_lib->vba.DETBufferSizeC,
> >                         mode_lib->vba.SwathHeightY,
> >                         mode_lib->vba.SwathHeightC,
> >                         mode_lib->vba.LBBitPerPixel,
> >                         v->SwathWidthY,
> >                         v->SwathWidthC,
> >                         mode_lib->vba.HRatio,
> >                         mode_lib->vba.HRatioChroma,
> >                         mode_lib->vba.vtaps,
> >                         mode_lib->vba.VTAPsChroma,
> >                         mode_lib->vba.VRatio,
> >                         mode_lib->vba.VRatioChroma,
> >                         mode_lib->vba.HTotal,
> >                         mode_lib->vba.VTotal,
> >                         mode_lib->vba.VActive,
> >                         mode_lib->vba.PixelClock,
> >                         mode_lib->vba.BlendingAndTiming,
> >                         .... /* more arguments */);
> > 
> > into calling conventions that take a pointer to 'mode_lib->vba' and another
> > one to 'v', so these are no longer passed on the stack individually.
> 
> So I took a whack at reducing this function's number of parameters and
> ended up with the attached patch. I basically just removed any
> parameters that were identical between the two call sites and access them
> through the vba pointer, as you suggested.
> 
> AMD folks, is this an acceptable approach? It didn't take a trivial
> amount of time so I want to make sure this is okay before I do it to
> more functions/files.
> 
> Due to the potential size of these changes, I am a little weary of them
> going into 6.0; even though they should be a simple search and replace
> for the most part, it might be nice for them to have some decent soak
> time in -next. One solution would be to raise the warning limit for
> these files on 6.0 so that allmodconfig does not ship broken then reduce
> the limit for 6.1 once these patches have been applied.
> 
> Additionally, I took a look at the stack usage across all compilers that
> the kernel supports and I thought it was kind of interesting that the
> usage really jumps from GCC 7 to 8, which I am guessing is a result of
> commit e8a170ff9a35 ("drm/amdgpu: enable -msse2 for GCC 7.1+ users").
> GCC 8 allmodconfig actually errors now too:
> 
> https://lore.kernel.org/alpine.DEB.2.22.394.2208152006320.289321@ramsan.of.borg/
> 
>           |-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
>           | dml30_ModeSupportAndSystemConfigurationFull() | dml31_ModeSupportAndSystemConfigurationFull() | dml32_ModeSupportAndSystemConfigurationFull() |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> | GCC 5   |                  1056 bytes                   |                   656 bytes                   |                  1040 bytes                   |
> | GCC 6   |                  1024 bytes                   |                   672 bytes                   |                  1056 bytes                   |
> | GCC 7   |                  1040 bytes                   |                   664 bytes                   |                  1056 bytes                   |
> | GCC 8   |                  1760 bytes                   |                  1608 bytes                   |                  2144 bytes                   |
> | GCC 9   |                  1664 bytes                   |                  1392 bytes                   |                  1960 bytes                   |
> | GCC 10  |                  1648 bytes                   |                  1368 bytes                   |                  1952 bytes                   |
> | GCC 11  |                  1680 bytes                   |                  1400 bytes                   |                  1952 bytes                   |
> | GCC 12  |                  1680 bytes                   |                  1400 bytes                   |                  1984 bytes                   |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> | LLVM 11 |                  2104 bytes                   |                  2056 bytes                   |                  2120 bytes                   |
> | LLVM 12 |                  2152 bytes                   |                  2200 bytes                   |                  2152 bytes                   |
> | LLVM 13 |                  2216 bytes                   |                  2248 bytes                   |                  2168 bytes                   |
> | LLVM 14 |                  2168 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> | LLVM 15 |                  2216 bytes                   |                  2184 bytes                   |                  2160 bytes                   |
> | LLVM 16 |                  2232 bytes                   |                  2216 bytes                   |                  2176 bytes                   |
> |---------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
> 
> With the patch I have attached,
> dml32_ModeSupportAndSystemConfigurationFull() drops from 2176 to 1944
> for LLVM 16, which is obviously still not great but it at least avoids
> the warning.
> 
> Cheers,
> Nathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ