lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 13 Feb 2024 23:38:35 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Pranav Prasad <pranavpp@...gle.com>, jstultz@...gle.com, sboyd@...nel.org
Cc: linux-kernel@...r.kernel.org, krossmo@...gle.com
Subject: Re: [PATCH v2 2/2] alarmtimer: Modify alarmtimer suspend callback
 to check for imminent alarm using PM notifier

Pranav!

On Tue, Feb 13 2024 at 20:30, Pranav Prasad wrote:
>> How is this supposed to work? rtc is NULL.
>
> alarmtimer_get_soonest() has the following:
> 	rtc = alarmtimer_get_rtcdev();
>         if (!rtc)
> 		return 0;
>
> rtc is set in alarmtimer_get_soonest(). If rtc is NULL, the notifier
> returns NOTIFY_DONE before even reaching pm_wakeup_event(), hence there is
> no NULL pointer dereference expected.

      struct rtc_device *rtc = NULL;

      if (!alarmtimer_get_soonest(rtc, ....)
      	  return 0 ;

      pm_wakeup_event(rtc->dev);

static bool alarmtimer_get_soonest(struct rtc_device *rtc, ....)
{
 	rtc = alarmtimer_get_rtcdev();
        if (!rtc)
 		return false;
        ...
        return true;
}

How is the assignment in alarmtimer_get_soonest() causing 'rtc' at the
callsite to become magically non NULL unless you have this shiny new AI
enhanced compiler with the long awaited 'do what I mean' optimization
enabled by default.

My old school brain based compiler is absolutely sure that both places
which I pointed out are straight forward unconditional NULL pointer
dereferences. See below.

>> How was this ever tested?
>
> I tested it by forcing kernel suspend writing 'mem' to /sys/power/state and
> using a large window (120s instead of the current 2s) so that a pending
> alarm is likely. The debug print is logged as expected without any kernel
> crash, and the suspend gets aborted.

I have no idea what you tested, but definitely not the complete
submitted code.

The only reason why this did not explode in your face in
pm_wakeup_event() is that this function has a NULL pointer check and
struct rtc_device has @dev as first member, which means that

       rtc == &rtc->dev

resulting in the NULL pointer check to be effective because &rtc->dev
evaluates to NULL.

But that does not make the code more correct. It's still am
unconditional NULL pointer dereference by definition, no?

Just flip the ordering of @dev in @owner in struct rtc_device and watch
the show.

alarmtimer_suspend() did not explode in your face because either the
notifier aborted suspend, which means the function was never reached, or
there was no timer armed. Why?

Because rtc_cancel_timer() has no NULL pointer check and unconditionally
dereferences @rtc.

Just for the record. I missed to spot this gem in your patch:

> +	/* Find the soonest timer to expire */
> +	if (!alarmtimer_get_soonest(rtc, &min, &expires, &type))
>  		return 0;
>  
> -	if (ktime_to_ns(min) < suspend_check_duration_ms * NSEC_PER_MSEC) {
> -		pm_wakeup_event(dev, suspend_check_duration_ms);
> -		return -EBUSY;
> -	}
> -

So you delete the threshold check. What makes sure that a timer which
got armed _after_ the notifier ran or one of the nanosleep timers which
are only accounted for after freezing are checked for expiring before
the threshold?

Nothing, right?

But sure, the patch did what you wanted to demonstrate and that's why it
is correct and perfect, right?

I conceed that your patch works perfectly correct under the following
condition:

    Either alarmtimer_pm_callback() aborts the suspend or
    alarmtimer_get_soonest() does not find an armed timer when called in
    alarmtimer_suspend()

Unfortunately that's not reflecting reality in production systems unless
I'm missing something important here.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ