lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Feb 2020 08:43:57 +0100
From:   Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@...ia.com>
To:     Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Cc:     "Sverdlin, Alexander (Nokia - DE/Ulm)" <alexander.sverdlin@...ia.com>
Subject: Re: [PATCH RESEND] cpu/hotplug: Wait for cpu_hotplug to be enabled in
 cpu_up/down

Hello Thomas,

On 02/03/2020 07:08 PM, Thomas Gleixner wrote:
> So what?. User space has to handle -EBUSY properly and it was possible
> even before that PCI commit that the online/offline operation request
> returned -EBUSY.

> What's confusing about a EBUSY return code? It's pretty universaly used
> in situations where a facility is temporarily busy. If it's not
> sufficiently documented, why EBUSY can be returned and what that means,
> then this needs to be improved.

It is true this was happening before your work in the pci subsystem, I 
should've referenced original commit which made cpu_up/down returning 
EBUSY, I agree there is nothing to fix in your patch.

EBUSY existing and being commonly used doesnt justify it in every 
situation. We do not have problem only in userspace, but kernel as well, 
no user of cpu_up/down takes into account of possible temporal 
unavailability. Going into extreme, we could start returning EBUSY 
whenever we have resource/facility taken which would made every 
interface candidate for returning it. As I see it, EBUSY has its place 
in nonblocking APIs. Others should try (hard) not to return it. Handling 
it is further topic of its own. How large the timeout to quit? Let's say 
that we know that for cpu, it is 10 seconds which I proposed. Passing 
responsibility to select tmo to the users will spread out that policy to 
each subsystem of its own, yielding to situations where it will for 
someone work, for others not, depending on the tmo chosen.

These kind of waits I do not prefer, but I wasnt able to think of 
anything better to try to improve this situation. I still believe it 
should be improved, and once/if cpu hotplug will be able to remove 
cpu_hotplug_enable/disable, remove it.

> I have no idea why you need to offline/online CPUs to partition a
> system. There are surely more sensible ways to do that, but that's not
> part of this discussion.

I'd be happy to make it part.

We are using partrt from 
https://github.com/OpenEneaLinux/rt-tools/tree/master/partrt, 
cpu_up/down is part of it, AFAIK, it is there to force timer migration 
and doesnt have any other (known to me) usage. In the meantime since we 
started with core isolation, we changed how we treat isolated cores. We 
are now starting with isolcpus=cpu-list nohz_full=cpu-list 
rcu_nocbs=cpu-list, and we are atm at Linux 4.19. Earlier we had 
different setup where we wanted to use cores in the startup, partition 
later, however that showed to be problematic and not in line with how 
things are going in the area.

Do you think we do not need toggle them under these conditions?

Thanks,

Matija

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ