lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <925fe0fe-9303-4f49-b473-c3a3ecc5e2e6@proton.me>
Date: Mon, 03 Jun 2024 18:26:08 +0000
From: Benno Lossin <benno.lossin@...ton.me>
To: Andreas Hindborg <nmi@...aspace.dk>
Cc: Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>, Damien Le Moal <dlemoal@...nel.org>, Bart Van Assche <bvanassche@....org>, Hannes Reinecke <hare@...e.de>, Ming Lei <ming.lei@...hat.com>, "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>, Andreas Hindborg <a.hindborg@...sung.com>, Wedson Almeida Filho <wedsonaf@...il.com>, Greg KH <gregkh@...uxfoundation.org>, Matthew Wilcox <willy@...radead.org>, Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>, Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>, Björn Roy Baron <bjorn3_gh@...tonmail.com>, Alice Ryhl <aliceryhl@...gle.com>, Chaitanya Kulkarni <chaitanyak@...dia.com>, Luis Chamberlain <mcgrof@...nel.org>, Yexuan Yang <1182282462@...t.edu.cn>, Sergio González Collado <sergio.collado@...il.com>, Joel Granados <j.granados@...sung.com>, "Pankaj Raghav (Samsung)" <kernel@...kajraghav.com>, Daniel Gomez
	<da.gomez@...sung.com>, Niklas Cassel <Niklas.Cassel@....com>, Philipp Stanner <pstanner@...hat.com>, Conor Dooley <conor@...nel.org>, Johannes Thumshirn <Johannes.Thumshirn@....com>, Matias Bjørling <m@...rling.me>, open list <linux-kernel@...r.kernel.org>, "rust-for-linux@...r.kernel.org" <rust-for-linux@...r.kernel.org>, "lsf-pc@...ts.linux-foundation.org" <lsf-pc@...ts.linux-foundation.org>, "gost.dev@...sung.com" <gost.dev@...sung.com>
Subject: Re: [PATCH v4 1/3] rust: block: introduce `kernel::block::mq` module

On 03.06.24 14:01, Andreas Hindborg wrote:
> Benno Lossin <benno.lossin@...ton.me> writes:
>> On 01.06.24 15:40, Andreas Hindborg wrote:
>>> +impl seal::Sealed for Initialized {}
>>> +impl GenDiskState for Initialized {
>>> +    const DELETE_ON_DROP: bool = false;
>>> +}
>>> +impl seal::Sealed for Added {}
>>> +impl GenDiskState for Added {
>>> +    const DELETE_ON_DROP: bool = true;
>>> +}
>>> +
>>> +impl<T: Operations> GenDisk<T, Initialized> {
>>> +    /// Try to create a new `GenDisk`.
>>> +    pub fn try_new(tagset: Arc<TagSet<T>>) -> Result<Self> {
>>
>> Since there is no non-try `new` function, I think we should name this
>> function just `new`.
> 
> Right, I am still getting used to the new naming scheme. Do you know if
> it is documented anywhere?

I don't think it is documented, it might only be a verbal convention at
the moment. Although [1] is suggesting `new` for general constructors.
Since this is the only constructor, one could argue that the
recommendation is to use `new` (which I personally find a good idea).

[1]: https://rust-lang.github.io/api-guidelines/naming.html

[...]

>>> +impl<T: Operations> OperationsVTable<T> {
>>> +    /// This function is called by the C kernel. A pointer to this function is
>>> +    /// installed in the `blk_mq_ops` vtable for the driver.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - The caller of this function must ensure `bd` is valid
>>> +    ///   and initialized. The pointees must outlive this function.
>>
>> Until when do the pointees have to be alive? "must outlive this
>> function" could also be the case if the pointees die immediately after
>> this function returns.
> 
> It should not be plural. What I intended to communicate is that what
> `bd` points to must be valid for read for the duration of the function
> call. I think that is what "The pointee must outlive this function"
> states? Although when we talk about lifetime of an object pointed to by
> a pointer, I am not sure about the correct way to word this. Do we talk
> about the lifetime of the pointer or the lifetime of the pointed to
> object (the pointee). We should not use the same wording for the pointer
> and the pointee.
> 
> How about:
> 
>     /// - The caller of this function must ensure that the pointee of `bd` is
>     ///   valid for read for the duration of this function.

But this is not enough for it to be sound, right? You create an `ARef`
from `bd.rq`, which potentially lives forever. You somehow need to
require that the pointer `bd` stays valid for reads and (synchronized)
writes until the request is ended (probably via `blk_mq_end_request`).

>>> +    /// - This function must not be called with a `hctx` for which
>>> +    ///   `Self::exit_hctx_callback()` has been called.
>>> +    /// - (*bd).rq must point to a valid `bindings:request` for which
>>> +    ///   `OperationsVTable<T>::init_request_callback` was called
>>
>> Missing `.` at the end.
> 
> Thanks.
> 
>>
>>> +    unsafe extern "C" fn queue_rq_callback(
>>> +        _hctx: *mut bindings::blk_mq_hw_ctx,
>>> +        bd: *const bindings::blk_mq_queue_data,
>>> +    ) -> bindings::blk_status_t {
>>> +        // SAFETY: `bd.rq` is valid as required by the safety requirement for
>>> +        // this function.
>>> +        let request = unsafe { &*(*bd).rq.cast::<Request<T>>() };
>>> +
>>> +        // One refcount for the ARef, one for being in flight
>>> +        request.wrapper_ref().refcount().store(2, Ordering::Relaxed);
>>> +
>>> +        // SAFETY: We own a refcount that we took above. We pass that to `ARef`.
>>> +        // By the safety requirements of this function, `request` is a valid
>>> +        // `struct request` and the private data is properly initialized.
>>> +        let rq = unsafe { Request::aref_from_raw((*bd).rq) };
>>
>> I think that you need to require that the request is alive at least
>> until `blk_mq_end_request` is called for the request (since at that
>> point all `ARef`s will be gone).
>> Also if this is not guaranteed, the safety requirements of
>> `AlwaysRefCounted` are violated (since the object can just disappear
>> even if it has refcount > 0 [the refcount refers to the Rust refcount in
>> the `RequestDataWrapper`, not the one in C]).
> 
> Yea, for the last invariant of `Request`:
> 
>   /// * `self` is reference counted by atomic modification of
>   ///   self.wrapper_ref().refcount().
> 
> I will add this to the safety comment at the call site:
> 
>   //  - `rq` will be alive until `blk_mq_end_request` is called and is
>   //    reference counted by `ARef` until then.

Seems like you already want to use this here :)

[...]

>>> +    /// This function is called by the C kernel. A pointer to this function is
>>> +    /// installed in the `blk_mq_ops` vtable for the driver.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// This function may only be called by blk-mq C infrastructure. `set` must

`set` doesn't exist (`_set` does), you are also not using this
requirement.

>>> +    /// point to an initialized `TagSet<T>`.
>>> +    unsafe extern "C" fn init_request_callback(
>>> +        _set: *mut bindings::blk_mq_tag_set,
>>> +        rq: *mut bindings::request,
>>> +        _hctx_idx: core::ffi::c_uint,
>>> +        _numa_node: core::ffi::c_uint,
>>> +    ) -> core::ffi::c_int {
>>> +        from_result(|| {
>>> +            // SAFETY: The `blk_mq_tag_set` invariants guarantee that all
>>> +            // requests are allocated with extra memory for the request data.
>>
>> What guarantees that the right amount of memory has been allocated?
>> AFAIU that is guaranteed by the `TagSet` (but there is no invariant).
> 
> It is by C API contract. `TagSet`::try_new` (now `new`) writes
> `cmd_size` into the `struct blk_mq_tag_set`. That is picked up by
> `blk_mq_alloc_tag_set` to allocate the right amount of space for each request.
> 
> The invariant here is on the C type. Perhaps the wording is wrong. I am
> not exactly sure how to express this. How about this:
> 
>             // SAFETY: We instructed `blk_mq_alloc_tag_set` to allocate requests
>             // with extra memory for the request data when we called it in
>             // `TagSet::new`.

I think you need a safety requirement on the function: `rq` points to a
valid `Request`. Then you could just use `Request::wrapper_ptr` instead
of the line below.

>>> +            let pdu = unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast::<RequestDataWrapper>();
>>> +
>>> +            // SAFETY: The refcount field is allocated but not initialized, this
>>> +            // valid for write.
>>> +            unsafe { RequestDataWrapper::refcount_ptr(pdu).write(AtomicU64::new(0)) };
>>> +
>>> +            Ok(0)
>>> +        })
>>> +    }
>>
>> [...]
>>
>>> +    /// Notify the block layer that a request is going to be processed now.
>>> +    ///
>>> +    /// The block layer uses this hook to do proper initializations such as
>>> +    /// starting the timeout timer. It is a requirement that block device
>>> +    /// drivers call this function when starting to process a request.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// The caller must have exclusive ownership of `self`, that is
>>> +    /// `self.wrapper_ref().refcount() == 2`.
>>> +    pub(crate) unsafe fn start_unchecked(this: &ARef<Self>) {
>>> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> +        // existence of `&mut self` we have exclusive access.
>>
>> We don't have a `&mut self`. But the safety requirements ask for a
>> unique `ARef`.
> 
> Thanks, I'll rephrase to:
> 
>         // SAFETY: By type invariant, `self.0` is a valid `struct request` and
>         // we have exclusive access.
> 
>>
>>> +        unsafe { bindings::blk_mq_start_request(this.0.get()) };
>>> +    }
>>> +
>>> +    fn try_set_end(this: ARef<Self>) -> Result<ARef<Self>, ARef<Self>> {
>>> +        // We can race with `TagSet::tag_to_rq`
>>> +        match this.wrapper_ref().refcount().compare_exchange(
>>> +            2,
>>> +            0,
>>> +            Ordering::Relaxed,
>>> +            Ordering::Relaxed,
>>> +        ) {
>>> +            Err(_old) => Err(this),
>>> +            Ok(_) => Ok(this),
>>> +        }
>>> +    }
>>> +
>>> +    /// Notify the block layer that the request has been completed without errors.
>>> +    ///
>>> +    /// This function will return `Err` if `this` is not the only `ARef`
>>> +    /// referencing the request.
>>> +    pub fn end_ok(this: ARef<Self>) -> Result<(), ARef<Self>> {
>>> +        let this = Self::try_set_end(this)?;
>>> +        let request_ptr = this.0.get();
>>> +        core::mem::forget(this);
>>> +
>>> +        // SAFETY: By type invariant, `self.0` is a valid `struct request`. By
>>> +        // existence of `&mut self` we have exclusive access.
>>
>> Same here, but in this case, the `ARef` is unique, since you called
>> `try_set_end`. You could make it a `# Guarantee` of `try_set_end`: "If
>> `Ok(aref)` is returned, then the `aref` is unique."
> 
> Makes sense. I have not seen `# Guarantee` used anywhere. Do you have a link for that use?

Alice used it a couple of times, eg in [2]. I plan on putting it in the
safety standard.

[2]: https://lore.kernel.org/rust-for-linux/20230601134946.3887870-2-aliceryhl@google.com/

---
Cheers,
Benno


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ