[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d29d6893-1313-4e6e-a3a9-3aeabaa270dd@oracle.com>
Date: Thu, 16 Jan 2025 04:00:55 +1100
From: imran.f.khan@...cle.com
To: Thomas Gleixner <tglx@...utronix.de>, anna-maria@...utronix.de,
frederic@...nel.org
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] timers: introduce timer_try_add_on_cpu.
Hello Thomas,
Thanks for taking a look and your feedback.
On 16/1/2025 3:04 am, Thomas Gleixner wrote:
> On Thu, Jan 16 2025 at 00:41, Imran Khan wrote:
>> + * Return:
>> + * * %true - If timer was started on an online cpu
>> + * * %false - If the specified cpu was offline or if its online status
>> + * could not be ensured due to unavailability of hotplug lock.
>> + */
>> +bool timer_try_add_on_cpu(struct timer_list *timer, int cpu)
>> +{
>> + bool ret = true;
>> +
>> + if (unlikely(!cpu_online(cpu)))
>> + ret = false;
>> + else if (cpus_read_trylock()) {
>> + if (likely(cpu_online(cpu)))
>> + add_timer_on(timer, cpu);
>> + else
>> + ret = false;
>> + cpus_read_unlock();
>> + } else
>> + ret = false;
>> +
>> + return ret;
>
> Aside of the horrible coding style, that cpus_read_trylock() part does
> not make any sense.
>
> It's perfectly valid to queue a timer on a online CPU when the CPU
> hotplug lock is held write, which can have tons of reasons even
> unrelated to an actual CPU hotplug operation.
>
> Even during a hotplug operation adding a timer on a particular CPU is
> valid, whether that's the CPU which is actually plugged or not is
> irrelevant.
>
> So if we add such a function, then it needs to have very precisely
> defined semantics, which have to be independent of the CPU hotplug lock.
>
The hotplug lock is being used to avoid the scenario where cpu_online tells
that @cpu is online but @cpu gets offline before add_timer_on could
actually add the timer to that @cpu's timer base.
Are you saying that this can't happen or by "defined semantics"
you mean that @cpu indicated as online by cpu_online should not get
offline in the middle of this function.
> The only way I can imagine is that the state is part of the per CPU
> timer base, but then I have to ask the question what is actually tried
> to solve here.
>
> As far as I understood that there is an issue in the RDS code, queueing
> a delayed work on a offline CPU, but that should have triggered at least
> the warning in __queue_delayed_work(), right?
>
I guess you are referring to warning of [1]. This was just added few days
back but the timer of delayed_work can still end up on offlined cpu.
> So the question is whether this try() interface is solving any of this
> and not papering over the CPU hotplug related issues in the RDS code in
> some way.
>
The RDS code that I referred to in my query, is an in-house change and there
may be some scope of updating the cached-cpu information there with cpu hotplug
callbacks. But we also wanted to see if something could be done on timer
side to address the possibilty of timer ending up on an offlined cpu. That's
why I asked earlier if you see any merit in having a try() interface.
As of now I don't have any more cases, running into this problem (putting
timer-wheel timers on offlined cpu). May be with warning in
__queue_delayed_work and (if gets added) in add_timer_on we may see
more such cases.
But if you agree, try() interface could still be added albeit without
hotplug lock.
Thanks,
Imran
[1]: https://github.com/torvalds/linux/blob/master/kernel/workqueue.c#L2511
> Thanks,
>
> tglx
>
>
Powered by blists - more mailing lists