lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 26 Jul 2019 12:37:11 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        linux-acpi@...r.kernel.org,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH v1] ACPI / scan: Acquire device_hotplug_lock in
 acpi_scan_init()

On 26.07.19 12:31, Michal Hocko wrote:
> On Fri 26-07-19 10:57:52, David Hildenbrand wrote:
>> On 26.07.19 10:44, Michal Hocko wrote:
>>> On Fri 26-07-19 10:36:42, David Hildenbrand wrote:
>>>> On 26.07.19 10:31, Michal Hocko wrote:
>>> [...]
>>>>> Anyway, my dislike of the device_hotplug_lock persists. I would really
>>>>> love to see it go rather than grow even more to the hotplug code. We
>>>>> should be really striving for mem hotplug internal and ideally range
>>>>> defined locking longterm. 
>>>>
>>>> Yes, and that is a different story, because it will require major
>>>> changes to all add_memory() users. (esp, due to the documented race
>>>> conditions). Having that said, memory hotplug locking is not ideal yet.
>>>
>>> I am really happy to hear that we are on the same page here. Do we have
>>> any document (I am sorry but I am lacking behind recent development in
>>> this area) that describes roadblocks to remove device_hotplug_lock?
>>
>> Only the core-api document I mentioned (I documented there quite some
>> current conditions I identified back then).
> 
> That document doesn't describe which _data structures_ are protected by
> the lock though. It documents only the current state of locking.

Yeah, I also thing we should find out more and document it.
Unfortunately, optimize the locking is not very high on my priority list
(there are more critical things to figure out than optimizing locking
that at least seems to work :) ). It is on my list, though.

> 
>> I am not sure if we can remove it completely from
>> add_memory()/remove_memory(): We actually create/delete devices which
>> can otherwise create races with user space.
> 
> More details would be really appreciated.
> 
>> Besides that:
>> - try_offline_node() needs the lock to synchronize against cpu hotplug
>> - I *assume* try_online_node() needs it as well
> 
> more details on why would be great.
> 
>> Then, there is the possible race condition with user space onlining
>> memory avoided by the lock. Also, currently the lock protects the
>> "online_type" when onlining memory.
> 
> I do not see the race, if the user API triggered online/offline takes a
> range lock on the affected physical memory range

Yeah, and that's still future work. Another item on the list.

> 
>> Then, there might be other global variables (eventually
>> zone/node/section related) that might need this lock right now - no
>> details known.
> 
> zones/nodes have their own locking for spans. Sections should be using
> a low level locking but I am not really sure this is needed if there is
> a mem hotplug lock in place (range or global)
> 
>> IOW, we have to be very carefully and it is more involved than it might
>> seem.
> 
> I am not questioning that. And that is why I am asking about a todo list
> for that transition.

I think somebody will have to invest quite some effort to create that
todo list first :) (I'd love to provide more information right now, but
I don't really have more)

> 
>> Locking is definitely better (and more reliably!) than one year ago, but
>> there is definitely a lot to do. (unfortunately, just like in many areas
>> in memory hotplug code :( - say zone handling when offlining/failing to
>> online memory).
> 
> Yeah, the code is shaping up. And I am happy to see that happening. But
> please try to understand that I really do not like to see some ad-hoc
> locking enforcement without a clear locking model in place. This patch
> is an example of it. Whoever would like to rationalize locking further
> will have to stumble over this and scratch head why the hack the locking
> is there and my experience tells me that people usually go along with
> existing code and make further assumptions based on that so we are
> unlikely to get rid of the locking...

I do understand, but we really have to rethink locking in a more broad
sense and document it. Here, I am going to add a comment as requested by
Rafael.

-- 

Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ