linux-kernel - Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c24af958-b933-42dd-9806-9d288463547b@kernel.dk>
Date:   Tue, 5 Dec 2023 16:31:51 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     NeilBrown <neilb@...e.de>
Cc:     Christian Brauner <brauner@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Oleg Nesterov <oleg@...hat.com>,
        Chuck Lever <chuck.lever@...cle.com>,
        Jeff Layton <jlayton@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-nfs@...r.kernel.org
Subject: Re: [PATCH 1/2] Allow a kthread to declare that it calls
 task_work_run()

On 12/5/23 4:23 PM, NeilBrown wrote:
> On Wed, 06 Dec 2023, NeilBrown wrote:
>> On Wed, 06 Dec 2023, Jens Axboe wrote:
>>> On 12/5/23 2:58 PM, Jens Axboe wrote:
>>>> On 12/5/23 2:28 PM, NeilBrown wrote:
>>>>> On Tue, 05 Dec 2023, Christian Brauner wrote:
>>>>>> On Mon, Dec 04, 2023 at 03:09:44PM -0700, Jens Axboe wrote:
>>>>>>> On 12/4/23 2:02 PM, NeilBrown wrote:
>>>>>>>> It isn't clear to me what _GPL is appropriate, but maybe the rules
>>>>>>>> changed since last I looked..... are there rules?
>>>>>>>>
>>>>>>>> My reasoning was that the call is effectively part of the user-space
>>>>>>>> ABI.  A user-space process can call this trivially by invoking any
>>>>>>>> system call.  The user-space ABI is explicitly a boundary which the GPL
>>>>>>>> does not cross.  So it doesn't seem appropriate to prevent non-GPL
>>>>>>>> kernel code from doing something that non-GPL user-space code can
>>>>>>>> trivially do.
>>>>>>>
>>>>>>> By that reasoning, basically everything in the kernel should be non-GPL
>>>>>>> marked. And while task_work can get used by the application, it happens
>>>>>>> only indirectly or implicitly. So I don't think this reasoning is sound
>>>>>>> at all, it's not an exported ABI or API by itself.
>>>>>>>
>>>>>>> For me, the more core of an export it is, the stronger the reason it
>>>>>>> should be GPL. FWIW, I don't think exporting task_work functionality is
>>>>
>>>>>>
>>>>>> Yeah, I'm not too fond of that part as well. I don't think we want to
>>>>>> give modules the ability to mess with task work. This is just asking for
>>>>>> trouble.
>>>>>>
>>>>>
>>>>> Ok, maybe we need to reframe the problem then.
>>>>>
>>>>> Currently fput(), and hence filp_close(), take control away from kernel
>>>>> threads in that they cannot be sure that a "close" has actually
>>>>> completed.
>>>>>
>>>>> This is already a problem for nfsd.  When renaming a file, nfsd needs to
>>>>> ensure any cached "open" that it has on the file is closed (else when
>>>>> re-exporting an NFS filesystem it can result in a silly-rename).
>>>>>
>>>>> nfsd currently handles this case by calling flush_delayed_fput().  I
>>>>> suspect you are no more happy about exporting that than you are about
>>>>> exporting task_work_run(), but this solution isn't actually 100%
>>>>> reliable.  If some other thread calls flush_delayed_fput() between nfsd
>>>>> calling filp_close() and that same nfsd calling flush_delayed_fput(),
>>>>> then the second flush can return before the first flush (in the other
>>>>> thread) completes all the work it took on.
>>>>>
>>>>> What we really need - both for handling renames and for avoiding
>>>>> possible memory exhaustion - is for nfsd to be able to reliably wait for
>>>>> any fput() that it initiated to complete.
>>>>>
>>>>> How would you like the VFS to provide that service?
>>>>
>>>> Since task_work happens in the context of your task already, why not
>>>> just have a way to get it stashed into a list when final fput is done?
>>>> This avoids all of this "let's expose task_work" and using the task list
>>>> for that, which seems kind of pointless as you're just going to run it
>>>> later on manually anyway.
>>>>
>>>> In semi pseudo code:
>>>>
>>>> bool fput_put_ref(struct file *file)
>>>> {
>>>> 	return atomic_dec_and_test(&file->f_count);
>>>> }
>>>>
>>>> void fput(struct file *file)
>>>> {
>>>> 	if (fput_put_ref(file)) {
>>>> 		...
>>>> 	}
>>>> }
>>>>
>>>> and then your nfsd_file_free() could do:
>>>>
>>>> ret = filp_flush(file, id);
>>>> if (fput_put_ref(file))
>>>> 	llist_add(&file->f_llist, &l->to_free_llist);
>>>>
>>>> or something like that, where l->to_free_llist is where ever you'd
>>>> otherwise punt the actual freeing to.
>>>
>>> Should probably have the put_ref or whatever helper also init the
>>> task_work, and then reuse the list in the callback_head there. Then
>>> whoever flushes it has to call ->func() and avoid exposing ____fput() to
>>> random users. But you get the idea.
>>
>> Interesting ideas - thanks.
>>
>> So maybe the new API would be
>>
>>  fput_queued(struct file *f, struct llist_head *q)
>> and
>>  flush_fput_queue(struct llist_head *q)
>>
>> with the meaning being that fput_queued() is just like fput() except
>> that any file needing __fput() is added to the 'q'; and that
>> flush_fput_queue() calls __fput() on any files in 'q'.
>>
>> So to close a file nfsd would:
>>
>>   fget(f);
>>   flip_close(f);
>>   fput_queued(f, &my_queue);
>>
>> though possibly we could have a
>>   filp_close_queued(f, q)
>> as well.
>>
>> I'll try that out - but am happy to hear alternate suggestions for names :-)
>>
> 
> Actually ....  I'm beginning to wonder if we should just use
> __fput_sync() in nfsd.
> It has a big warning about not doing that blindly, but the detail in the
> warning doesn't seem to apply to nfsd...

If you can do it from the context where you do the filp_close() right
now, then yeah there's no reason to over-complicate this at all... FWIW,
the reason task_work exists is just to ensure a clean context to perform
these operations from the task itself. The more I think about it, it
doesn't make a lot of sense to utilize it for this purpose, which is
where my alternate suggestion came from. But if you can just call it
directly, then that makes everything much easier.

-- 
Jens Axboe