[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a65de6c-74d5-4d3e-be75-0aa9ecc82da1@roeck-us.net>
Date: Mon, 20 Jan 2025 06:15:30 -0800
From: Guenter Roeck <linux@...ck-us.net>
To: Jani Nikula <jani.nikula@...ux.intel.com>,
David Laight <david.laight.linux@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
David Laight <David.Laight@...lab.com>, Arnd Bergmann <arnd@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>, Matthew Wilcox <willy@...radead.org>,
Christoph Hellwig <hch@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Dan Carpenter <dan.carpenter@...aro.org>,
"Jason A . Donenfeld" <Jason@...c4.com>,
"pedro.falcato@...il.com" <pedro.falcato@...il.com>,
Mateusz Guzik <mjguzik@...il.com>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
intel-xe@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Rodrigo Vivi <rodrigo.vivi@...el.com>
Subject: Re: Buiild error in i915/xe
On 1/20/25 03:21, Jani Nikula wrote:
> On Mon, 20 Jan 2025, David Laight <david.laight.linux@...il.com> wrote:
>> On Mon, 20 Jan 2025 12:48:11 +0200
>> Jani Nikula <jani.nikula@...ux.intel.com> wrote:
>>
>>> On Sun, 19 Jan 2025, David Laight <david.laight.linux@...il.com> wrote:
>>>> On Sat, 18 Jan 2025 14:58:48 -0800
>>>> Guenter Roeck <linux@...ck-us.net> wrote:
>>>>
>>>>> On 1/18/25 14:11, David Laight wrote:
>>>>>> On Sat, 18 Jan 2025 13:21:39 -0800
>>>>>> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>>>>>>
>>>>>>> On Sat, 18 Jan 2025 at 09:49, Guenter Roeck <linux@...ck-us.net> wrote:
>>>>>>>>
>>>>>>>> No idea why the compiler would know that the values are invalid.
>>>>>>>
>>>>>>> It's not that the compiler knows tat they are invalid, but I bet what
>>>>>>> happens is in scale() (and possibly other places that do similar
>>>>>>> checks), which does this:
>>>>>>>
>>>>>>> WARN_ON(source_min > source_max);
>>>>>>> ...
>>>>>>> source_val = clamp(source_val, source_min, source_max);
>>>>>>>
>>>>>>> and the compiler notices that the ordering comparison in the first
>>>>>>> WARN_ON() is the same as the one in clamp(), so it basically converts
>>>>>>> the logic to
>>>>>>>
>>>>>>> if (source_min > source_max) {
>>>>>>> WARN(..);
>>>>>>> /* Do the clamp() knowing that source_min > source_max */
>>>>>>> source_val = clamp(source_val, source_min, source_max);
>>>>>>> } else {
>>>>>>> /* Do the clamp knowing that source_min <= source_max */
>>>>>>> source_val = clamp(source_val, source_min, source_max);
>>>>>>> }
>>>>>>>
>>>>>>> (obviously I dropped the other WARN_ON in the conversion, it wasn't
>>>>>>> relevant for this case).
>>>>>>>
>>>>>>> And now that first clamp() case is done with source_min > source_max,
>>>>>>> and it triggers that build error because that's invalid.
>>>>>>>
>>>>>>> So the condition is not statically true in the *source* code, but in
>>>>>>> the "I have moved code around to combine tests" case it now *is*
>>>>>>> statically true as far as the compiler is concerned.
>>>>>>
>>>>>> Well spotted :-)
>>>>>>
>>>>>> One option would be to move the WARN_ON() below the clamp() and
>>>>>> add an OPTIMISER_HIDE_VAR(source_max) between them.
>>>>>>
>>>>>> Or do something more sensible than the WARN().
>>>>>> Perhaps return target_min on any such errors?
>>>>>>
>>>>>
>>>>> This helps:
>>>>>
>>>>> - WARN_ON(source_min > source_max);
>>>>> - WARN_ON(target_min > target_max);
>>>>> -
>>>>> /* defensive */
>>>>> source_val = clamp(source_val, source_min, source_max);
>>>>>
>>>>> + WARN_ON(source_min > source_max);
>>>>> + WARN_ON(target_min > target_max);
>>>>
>>>> That is a 'quick fix' ...
>>>>
>>>> Much better would be to replace the WARN() with (say):
>>>> if (target_min >= target_max)
>>>> return target_min;
>>>> if (source_min >= source_max)
>>>> return target_min + (target_max - target_min)/2;
>>>> So that the return values are actually in range (in as much as one is defined).
>>>> Note that the >= cpmparisons also remove a divide by zero.
>>>
>>> I want the loud and early warnings for clear bugs instead of
>>> "gracefully" silencing the errors only to be found through debugging
>>> user reports.
>>
>> A user isn't going to notice a WARN() - not until you tell them to look for it.
>> In any case even if you output a message you really want to return a 'sane'
>> value, who knows what effect a very out of range value is going to have.
>
> The point is, we'll catch the WARN in CI before it goes out to users.
>
It isn't going to catch the divide by 0 error, and it obviously doesn't
catch the build problem on parisc with gcc 13.x because the CI isn't
testing it.
How about disabling DRM_XE on architectures where it isn't supported,
matching DRM_I915 ?
Thanks,
Guenter
Powered by blists - more mailing lists