lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20210810183746.GA32986@fuller.cnet> Date: Tue, 10 Aug 2021 15:37:46 -0300 From: Marcelo Tosatti <mtosatti@...hat.com> To: Thomas Gleixner <tglx@...utronix.de> Cc: linux-kernel@...r.kernel.org, Nitesh Lal <nilal@...hat.com>, Nicolas Saenz Julienne <nsaenzju@...hat.com>, Frederic Weisbecker <frederic@...nel.org>, Christoph Lameter <cl@...ux.com>, Juri Lelli <juri.lelli@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Alex Belits <abelits@...its.com>, Peter Xu <peterx@...hat.com> Subject: Re: [patch 0/4] extensible prctl task isolation interface and vmstat sync (v2) On Tue, Aug 10, 2021 at 06:40:48PM +0200, Thomas Gleixner wrote: > Marcelo, > > On Fri, Jul 30 2021 at 17:18, Marcelo Tosatti wrote: > > can you pretty please: > > 1) Add a version number to your patch series right where it belongs: > > [patch V2 N/M] > > Just having a (v2) at the end of the subject line of 0/M is sloppy > at best. > > 2) Provide a lore link to the previous version > > Thanks, > > tglx Thomas, Sure, will resend -v3 once done with the following: 1) Adding support for KVM. 2) Adding a tool called "chisol" to util-linux, similar to chrt/taskset, to prctl+exec (for unmodified applications). This raises the question whether or not to add an option to preserve the task parameters across fork (i think the answer is yes). -- But the following points are unclear to me (in quotes are earlier comments you made): 1) "It's about silencing different and largely independent parts of the OS on a particular CPU. Just defining upfront that there is only the choice of all or nothing _is_ policy. There is a very wide range of use case scenarios out there and just because the ones which you care about needs X does not mean that X is the right thing for everybody else. You still can have X and let other people define their own set of things they want to be protected against. Aside of that having it selectively is a plus for debugability, testing etc." So for the ability to individually select what parts of the OS on a particular CPU are quiesced, there is: + defmask = defmask | ISOL_F_QUIESCE_VMSTATS; + + ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, defmask, + 0, 0); + if (ret == -1) { + perror("prctl PR_ISOL_SET"); + return EXIT_FAILURE; + } However there is a feeling that implementation details are being exposed to userspace... However that seems to be alright: what could happen is that the feature ceases to exist (say vmstat sync), in kernel, and the bit is kept for compability (but the kernel does nothing about it). That of course means whatever "vmstat sync" replacement comes up, it should avoid IPIs as well. Any thoughts on this? 2) "Again: I fundamentaly disagree with the proposed task isolation patches approach as they leave no choice at all. There is a reasonable middle ground where an application is willing to pay the price (delay) until the reqested quiescing has taken place in order to run undisturbed (hint: cache ...) and also is willing to take the addtional overhead of an occacional syscall in the slow path without tripping some OS imposed isolation safe guard. Aside of that such a granular approach does not necessarily require the application to be aware of it. If the admin knows the computational pattern of the application, e.g. 1 read_data_set() <- involving syscalls/OS obviously 2 compute_set() <- let me alone 3 save_data_set() <- involving syscalls/OS obviously repeat the above... then it's at his discretion to decide to inflict a particular isolation set on the task which is obviously ineffective while doing #1 and #3 but might provide the so desired 0.9% boost for compute_set() which dominates the judgement. That's what we need to think about and once we figured out how to do that it gives Marcelo the mechanism to solve his 'run virt undisturbed by vmstat or whatever' problem and it allows Alex to build his stuff on it. Summary: The problem to be solved cannot be restricted to self_defined_important_task(OWN_WORLD); Policy is not a binary on/off problem. It's manifold across all levels of the stack and only a kernel problem when it comes down to the last line of defence. Up to the point where the kernel puts the line of last defence, policy is defined by the user/admin via mechanims provided by the kernel. Emphasis on "mechanims provided by the kernel", aka. user API. Just in case, I hope that I don't have to explain what level of scrunity and thought this requires." OK, so perhaps a handful of use-cases can clarify whether the proposed interface requires changes? The example on samples/task_isolation/ is focused on "enter task isolation and very rarely exit". There are two other cases i am aware of: A) Christoph's use-case: 1) Enter task-isolation. 2) Latency sensitive loop begins. 3) Some event interrupts latency sensitive section. 4) Handling of the event requires N syscalls, which the programmer would be interested in happening without quiescing at every return to system call. The current scheme would be: /* * Application can either set the value from ISOL_F_QUIESCE_DEFMASK, * which is configurable through * /sys/kernel/task_isolation/default_quiesce_activities, * or specific values. * * Using ISOL_F_QUIESCE_DEFMASK allows for the application to * take advantage of future quiescing capabilities without * modification (provided default_quiesce_activities is * configured accordingly). */ defmask = defmask | ISOL_F_QUIESCE_VMSTATS; ret = prctl(PR_ISOL_SET, ISOL_F_QUIESCE, defmask, 0, 0); if (ret == -1) { perror("prctl PR_ISOL_SET"); return EXIT_FAILURE; } lat_loop: ret = prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE, 0, 0, 0); if (ret == -1) { perror("prctl PR_ISOL_CTRL_SET (ISOL_F_QUIESCE)"); return EXIT_FAILURE; } latency sensitive loop if (event == 1) { /* disables quiescing of all features, while maintaining * other features such as logging and avoidance of * interruptions enabled. */ ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0); syscall1 syscall2 ... syscallN /* reenter isolated mode with quiescing */ goto lat_loop; } ... Should it be possible to modify individual quiescing parts individually while maintaining isolated mode? Yes, that seems to be desired. The other use-case (from you) seems to be: 1 read_data_set() <- involving syscalls/OS obviously 2 compute_set() <- let me alone 3 save_data_set() <- involving syscalls/OS obviously repeat the above... Well, the implementation of Christoph's use above seems not to be that bad as well: 1 read_data_set() <- involving syscalls/OS obviously /* disables quiescing of all (or some, if desired) * features, while maintaining other features such * as logging and avoidance of interruptions enabled. */ ret = prctl(PR_ISOL_CTRL_SET, ISOL_F_QUIESCE, 0, 0, 0); 2 compute_set() <- let me alone ret = prctl(PR_ISOL_CTRL_SET, 0, 0, 0, 0); 3 save_data_set() <- involving syscalls/OS obviously repeat the above... What kind of different behaviour, other than enabling/disabling quiescing, would be desired in this use-case?
Powered by blists - more mailing lists