[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2vxzfr78u3ne.fsf@kernel.org>
Date: Tue, 10 Feb 2026 14:30:45 +0100
From: Pratyush Yadav <pratyush@...nel.org>
To: Mike Rapoport <rppt@...nel.org>
Cc: Pratyush Yadav <pratyush@...nel.org>, Pasha Tatashin
<pasha.tatashin@...een.com>, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 2/2] liveupdate: luo_file: remember retrieve() status
Hi Mike,
Thanks for the review.
On Wed, Jan 28 2026, Mike Rapoport wrote:
> Hi Pratyush,
>
> On Tue, Jan 27, 2026 at 12:02:53AM +0100, Pratyush Yadav wrote:
>> From: "Pratyush Yadav (Google)" <pratyush@...nel.org>
>>
>> LUO keeps track of successful retrieve attempts on a LUO file. It does
>> so to avoid multiple retrievals of the same file. Doing so will cause
>
> ^ Multiple retrievals
Will fix.
>
>> problems because once the file is retrieved, the serialized data
>> structures are likely freed and the file is likely in a very different
>> state from what the code expects.
>>
>> This is kept track of by the retrieved boolean in struct luo_file, and
>
> The 'retrieve' boolean in struct luo_file keeps track of this,
ACK.
>
>> is passed to the finish callback so it knows what work was already done
>> and what it has left to do.
>>
>> All this works well when retrieve succeeds. When it fails,
>> luo_retrieve_file() returns the error immediately, without ever storing
>> anywhere that a retrieve was attempted or what its error code was. This
>> results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace,
>> but nothing prevents it from trying this again.
>>
>> The retry is problematic for much of the same reasons listed above. The
>> file is likely in a very different state than what the retrieve logic
>> normally expects, and it might even have freed some serialization data
>> structures. Attempting to access them or free them again is going to
>> break things.
>>
>> For example, if memfd managed to restore 8 of its 10 folios, but fails
>> on the 9th, a subsequent retrieve attempt will try to call
>> kho_restore_folio() on the first folio again, and that will fail with a
>> warning since it is an invalid operation.
>>
>> Apart from the retry, finish() also breaks. Since on failure the
>> retrieved bool in luo_file is never touched, the finish() call on
>> session close will tell the file handler that retrieve was never
>> attempted, and it will try to access or free the data structures that
>> might not exist, much in the same way as the retry attempt.
>>
>> There is no sane way of attempting the retrieve again. Remember the
>> error retrieve returned and directly return it on a retry. Also pass
>> this status code to finish() so it can make the right decision on the
>> work it needs to do.
>>
>> This is done by changing the bool to an integer. A value of 0 means
>> retrieve was never attempted, a positive value means it succeeded, and a
>> negative value means it failed and the error code is the value.
>>
>> Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
>> Signed-off-by: Pratyush Yadav (Google) <pratyush@...nel.org>
>> ---
>> include/linux/liveupdate.h | 7 ++++--
>> kernel/liveupdate/luo_file.c | 41 ++++++++++++++++++++++--------------
>> mm/memfd_luo.c | 7 +++++-
>> 3 files changed, 36 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/liveupdate.h b/include/linux/liveupdate.h
>> index a7f6ee5b6771..a543a3a8e837 100644
>> --- a/include/linux/liveupdate.h
>> +++ b/include/linux/liveupdate.h
>> @@ -21,7 +21,10 @@ struct file;
>> * struct liveupdate_file_op_args - Arguments for file operation callbacks.
>> * @handler: The file handler being called.
>> * @retrieved: The retrieve status for the 'can_finish / finish'
>> - * operation.
>> + * operation. A value of 0 means the retrieve has not been
>> + * attempted, a positive value means the retrieve was
>> + * successful, and a negative value means the retrieve failed,
>> + * and the value is the error code of the call.
>> * @file: The file object. For retrieve: [OUT] The callback sets
>> * this to the new file. For other ops: [IN] The caller sets
>> * this to the file being operated on.
>> @@ -37,7 +40,7 @@ struct file;
>> */
>> struct liveupdate_file_op_args {
>> struct liveupdate_file_handler *handler;
>> - bool retrieved;
>> + bool retrieve_sts;
>
> int retrieve_sts?
Ugh, stupid mistake... Thanks for catching.
>
> and maybe spell out _status rather than _sts?
Will do.
>
>> struct file *file;
>> u64 serialized_data;
>> void *private_data;
>> diff --git a/kernel/liveupdate/luo_file.c b/kernel/liveupdate/luo_file.c
>> index 9f7283379ebc..82577b4cca2b 100644
>> --- a/kernel/liveupdate/luo_file.c
>> +++ b/kernel/liveupdate/luo_file.c
>> @@ -133,9 +133,12 @@ static LIST_HEAD(luo_file_handler_list);
>> * state that is not preserved. Set by the handler's .preserve()
>> * callback, and must be freed in the handler's .unpreserve()
>> * callback.
>> - * @retrieved: A flag indicating whether a user/kernel in the new kernel has
>> + * @retrieve_sts: Status code indicating whether a user/kernel in the new kernel has
>> * successfully called retrieve() on this file. This prevents
>> - * multiple retrieval attempts.
>> + * multiple retrieval attempts. A value of 0 means a retrieve()
>> + * has not been attempted, a positive value means the retrieve()
>> + * was successful, and a negative value means the retrieve()
>> + * failed, and the value is the error code of the call.
>> * @mutex: A mutex that protects the fields of this specific instance
>> * (e.g., @retrieved, @file), ensuring that operations like
>> * retrieving or finishing a file are atomic.
>> @@ -160,7 +163,7 @@ struct luo_file {
>> struct file *file;
>> u64 serialized_data;
>> void *private_data;
>> - bool retrieved;
>> + int retrieve_sts;
>> struct mutex mutex;
>> struct list_head list;
>> u64 token;
>> @@ -293,7 +296,7 @@ int luo_preserve_file(struct luo_file_set *file_set, u64 token, int fd)
>> luo_file->file = file;
>> luo_file->fh = fh;
>> luo_file->token = token;
>> - luo_file->retrieved = false;
>> + luo_file->retrieve_sts = 0;
>
> We kzalloc() luo_file, so this is not strictly required.
Okay, will drop.
>
>> mutex_init(&luo_file->mutex);
>>
>> args.handler = fh;
>> @@ -569,7 +572,7 @@ int luo_retrieve_file(struct luo_file_set *file_set, u64 token,
>> return -ENOENT;
>>
>> guard(mutex)(&luo_file->mutex);
>> - if (luo_file->retrieved) {
>> + if (luo_file->retrieve_sts > 0) {
>> /*
>> * Someone is asking for this file again, so get a reference
>> * for them.
>> @@ -577,21 +580,27 @@ int luo_retrieve_file(struct luo_file_set *file_set, u64 token,
>> get_file(luo_file->file);
>> *filep = luo_file->file;
>> return 0;
>> + } else if (luo_file->retrieve_sts < 0) {
>> + /* Retrieve was attempted and it failed. Return the error code. */
>> + return luo_file->retrieve_sts;
>> }
>
> I'd put it before the check for > 0, i.e
>
> if (luo_file->retrieve_sts < 0)
> return luo_file->retrieve_sts;
>
> if (luo_file->retrieve_sts > 0)
> ...
ACK. Will do.
>
>
>> args.handler = luo_file->fh;
>> args.serialized_data = luo_file->serialized_data;
>> err = luo_file->fh->ops->retrieve(&args);
>> - if (!err) {
>> - luo_file->file = args.file;
>> -
>> - /* Get reference so we can keep this file in LUO until finish */
>> - get_file(luo_file->file);
>> - *filep = luo_file->file;
>> - luo_file->retrieved = true;
>> + if (err) {
>> + /* Keep the error code for later use. */
>> + luo_file->retrieve_sts = err;
>> + return err;
>> }
>>
>> - return err;
>> + luo_file->file = args.file;
>> + /* Get reference so we can keep this file in LUO until finish */
>> + get_file(luo_file->file);
>> + *filep = luo_file->file;
>> + luo_file->retrieve_sts = 1;
>> +
>> + return 0;
>> }
>>
>> static int luo_file_can_finish_one(struct luo_file_set *file_set,
>> @@ -607,7 +616,7 @@ static int luo_file_can_finish_one(struct luo_file_set *file_set,
>> args.handler = luo_file->fh;
>> args.file = luo_file->file;
>> args.serialized_data = luo_file->serialized_data;
>> - args.retrieved = luo_file->retrieved;
>> + args.retrieve_sts = luo_file->retrieve_sts;
>> can_finish = luo_file->fh->ops->can_finish(&args);
>> }
>>
>> @@ -624,7 +633,7 @@ static void luo_file_finish_one(struct luo_file_set *file_set,
>> args.handler = luo_file->fh;
>> args.file = luo_file->file;
>> args.serialized_data = luo_file->serialized_data;
>> - args.retrieved = luo_file->retrieved;
>> + args.retrieve_sts = luo_file->retrieve_sts;
>>
>> luo_file->fh->ops->finish(&args);
>> }
>> @@ -779,7 +788,7 @@ int luo_file_deserialize(struct luo_file_set *file_set,
>> luo_file->file = NULL;
>> luo_file->serialized_data = file_ser[i].data;
>> luo_file->token = file_ser[i].token;
>> - luo_file->retrieved = false;
>> + luo_file->retrieve_sts = 0;
>
> Here as well, we kzalloc() luo_file, so zeroing out of the fields is not
> strictly required.
ACK.
>
>> mutex_init(&luo_file->mutex);
>> list_add_tail(&luo_file->list, &file_set->files_list);
>> }
>> diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c
>> index a34fccc23b6a..ffc9f879833b 100644
>> --- a/mm/memfd_luo.c
>> +++ b/mm/memfd_luo.c
>> @@ -326,7 +326,12 @@ static void memfd_luo_finish(struct liveupdate_file_op_args *args)
>> struct memfd_luo_folio_ser *folios_ser;
>> struct memfd_luo_ser *ser;
>>
>> - if (args->retrieved)
>> + /*
>> + * If retrieve was successful, nothing to do. If it failed, retrieve()
>> + * already cleaned up everything it could. So nothing to do there
>> + * either. Only need to clean up when retrieve was not called.
>> + */
>> + if (args->retrieve_sts)
>> return;
>>
>> ser = phys_to_virt(args->serialized_data);
>> --
>> 2.52.0.457.g6b5491de43-goog
>>
--
Regards,
Pratyush Yadav
Powered by blists - more mailing lists