lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 01 Jun 2024 08:35:30 +0200
From: Andreas Hindborg <nmi@...aspace.dk>
To: Benno Lossin <benno.lossin@...ton.me>
Cc: Jens Axboe <axboe@...nel.dk>,  Christoph Hellwig <hch@....de>,  Keith
 Busch <kbusch@...nel.org>,  Damien Le Moal <dlemoal@...nel.org>,  Bart Van
 Assche <bvanassche@....org>,  Hannes Reinecke <hare@...e.de>,  Ming Lei
 <ming.lei@...hat.com>,  "linux-block@...r.kernel.org"
 <linux-block@...r.kernel.org>,  Andreas Hindborg <a.hindborg@...sung.com>,
  Wedson Almeida Filho <wedsonaf@...il.com>,  Greg KH
 <gregkh@...uxfoundation.org>,  Matthew Wilcox <willy@...radead.org>,
  Miguel Ojeda <ojeda@...nel.org>,  Alex Gaynor <alex.gaynor@...il.com>,
  Boqun Feng <boqun.feng@...il.com>,  Gary Guo <gary@...yguo.net>,
  Björn
 Roy Baron <bjorn3_gh@...tonmail.com>,  Alice Ryhl <aliceryhl@...gle.com>,
  Chaitanya Kulkarni <chaitanyak@...dia.com>,  Luis Chamberlain
 <mcgrof@...nel.org>,  Yexuan Yang <1182282462@...t.edu.cn>,  Sergio
 González Collado <sergio.collado@...il.com>,  Joel
 Granados
 <j.granados@...sung.com>,  "Pankaj Raghav (Samsung)"
 <kernel@...kajraghav.com>,  Daniel Gomez <da.gomez@...sung.com>,  Niklas
 Cassel <Niklas.Cassel@....com>,  Philipp Stanner <pstanner@...hat.com>,
  Conor Dooley <conor@...nel.org>,  Johannes Thumshirn
 <Johannes.Thumshirn@....com>,  Matias Bjørling
 <m@...rling.me>,  open list
 <linux-kernel@...r.kernel.org>,  "rust-for-linux@...r.kernel.org"
 <rust-for-linux@...r.kernel.org>,  "lsf-pc@...ts.linux-foundation.org"
 <lsf-pc@...ts.linux-foundation.org>,  "gost.dev@...sung.com"
 <gost.dev@...sung.com>
Subject: Re: [PATCH v2 1/3] rust: block: introduce `kernel::block::mq` module

Benno Lossin <benno.lossin@...ton.me> writes:

[...]

>>>> +    /// Notify the block layer that the request has been completed without errors.
>>>> +    ///
>>>> +    /// This function will return `Err` if `this` is not the only `ARef`
>>>> +    /// referencing the request.
>>>> +    pub fn end_ok(this: ARef<Self>) -> Result<(), ARef<Self>> {
>>>
>>> I am not yet fully convinced that this is the way we should go. I think
>>> I would have to see a more complex usage of `Request` with that id <->
>>> request mapping that you mentioned. Because with how rnull uses this
>>> API, it could also have a `URef<Self>` parameter (URef := unique ARef).
>> 
>> I considered a `UniqueARef` but it would just move the error handing to
>> `ARef::into_unique` and then make `end_ok` infallible.
>> 
>> There are four states for a request that we need to track:
>> 
>> A) Request is owned by block layer (refcount 0)
>> B) Request is owned by driver but with zero `ARef`s in existence
>>    (refcount 1)
>> C) Request is owned by driver with exactly one `ARef` in existence
>>    (refcount 2)
>> D) Request is owned by driver with more than one `ARef` in existence
>>    (refcount > 2)
>> 
>> It is in the doc comments for `RequestDataWrapper` as well.
>> 
>> We need A and B to ensure we fail tag to request conversions for
>> requests that are not owned by the driver.
>> 
>> We need C and D to ensure that it is safe to end the request and hand back
>> ownership to the block layer.
>> 
>> I will ping you when I hook up the NVMe driver with this.
>
> Thanks. I think that since the C side doesn't use ref-counting, the
> lifecycle of a request is probably rather simple. Therefore we should
> try to also avoid refcounting in Rust and see if we can eg tie ending
> requests to the associated `TagSet` (ie require `&mut` on the tagset)
> and tie accessing requests to shared access to the `TagSet`. Then we
> would be able to avoid the refcount. But I will first have to take a
> look at the nvme driver to gauge the plausibility.

C side _does_ use ref-counting in the `ref` field of the C `struct
request`. I am not able to reuse that field for the state tracking I
need. Other users such as iostat will take references on the request and
we will not be able to tell if there are no more Rust refs to the
request from that field. We need a separate one.

Anyways I think we should go with the current implementation for now. We
can always change it, nothing is locked in stone.

[...]

>>>> +/// Store the result of `op(target.load())` in target, returning new value of
>>>> +/// taret.
>>>> +fn atomic_relaxed_op_return(target: &AtomicU64, op: impl Fn(u64) -> u64) -> u64 {
>>>> +    let mut old = target.load(Ordering::Relaxed);
>>>> +    loop {
>>>> +        match target.compare_exchange_weak(old, op(old), Ordering::Relaxed, Ordering::Relaxed) {
>>>> +            Ok(_) => break,
>>>> +            Err(x) => {
>>>> +                old = x;
>>>> +            }
>>>> +        }
>>>> +    }
>>>
>>> This seems like a reimplementation of `AtomicU64::fetch_update` to me.
>> 
>> It looks like it! Except this function is returning the updated value,
>> `fetch_update` is returning the old value.
>> 
>> Would you rather that I rewrite in terms of the library function?
>
> If you can just use the fetch_update function, then that would be better
> than (almost) reimplementing it. But if you really need to get the new
> value, then I guess it can't really be helped. (or do you think you can
> just apply `op` to the old value returned by `fetch_update`?)

I can implement `atomic_relaxed_op_return` in terms of `fetch_update` 👍

[...]

>>>> +                let place = place.cast::<bindings::blk_mq_tag_set>();
>>>> +
>>>> +                // SAFETY: try_ffi_init promises that `place` is writable, and
>>>> +                // zeroes is a valid bit pattern for this structure.
>>>> +                core::ptr::write_bytes(place, 0, 1);
>>>> +
>>>> +                /// For a raw pointer to a struct, write a struct field without
>>>> +                /// creating a reference to the field
>>>> +                macro_rules! write_ptr_field {
>>>> +                    ($target:ident, $field:ident, $value:expr) => {
>>>> +                        ::core::ptr::write(::core::ptr::addr_of_mut!((*$target).$field), $value)
>>>> +                    };
>>>> +                }
>>>> +
>>>> +                // SAFETY: try_ffi_init promises that `place` is writable
>>>> +                    write_ptr_field!(place, ops, OperationsVTable::<T>::build());
>>>> +                    write_ptr_field!(place, nr_hw_queues , nr_hw_queues);
>>>> +                    write_ptr_field!(place, timeout , 0); // 0 means default which is 30 * HZ in C
>>>> +                    write_ptr_field!(place, numa_node , bindings::NUMA_NO_NODE);
>>>> +                    write_ptr_field!(place, queue_depth , num_tags);
>>>> +                    write_ptr_field!(place, cmd_size , core::mem::size_of::<RequestDataWrapper>().try_into()?);
>>>> +                    write_ptr_field!(place, flags , bindings::BLK_MQ_F_SHOULD_MERGE);
>>>> +                    write_ptr_field!(place, driver_data , core::ptr::null_mut::<core::ffi::c_void>());
>>>> +                    write_ptr_field!(place, nr_maps , num_maps);
>>>
>>> I think that there is some way for pinned-init to do a better job here.
>>> I feel like we ought to be able to just write:
>>>
>>>     Opaque::init(
>>>         try_init!(bindings::blk_mq_tag_set {
>>>             ops: OperationsVTable::<T>::build(),
>>>             nr_hw_queues,
>>>             timeout: 0, // 0 means default, which is 30Hz
>>>             numa_node: bindings::NUMA_NO_NODE,
>>>             queue_depth: num_tags,
>>>             cmd_size: size_of::<RequestDataWrapper>().try_into()?,
>>>             flags: bindings::BLK_MQ_F_SHOULD_MERGE,
>>>             driver_data: null_mut(),
>>>             nr_maps: num_maps,
>>>             ..Zeroable::zeroed()
>>>         }? Error)
>>>         .chain(|tag_set| to_result(bindings::blk_mq_alloc_tag_set(tag_set)))
>>>     )
>>>
>>> But we don't have `Opaque::init` (shouldn't be difficult) and
>>> `bindings::blk_mq_tag_set` doesn't implement `Zeroable`. We would need
>>> bindgen to put `derive(Zeroable)` on certain structs...
>>>
>>> Another option would be to just list the fields explicitly, since there
>>> aren't that many. What do you think?
>> 
>> Both options sound good. Ofc the first one sounds more user friendly
>> while the latter one sounds easier to implement. Getting rid of the
>> unsafe blocks here would be really nice.
>
> I think since it is not that expensive in this case, you should go for
> the second approach.
> We can fix it later, when we get the proper bindgen support.

Cool, I will send a follow up patch with this 👍

Best regards,
Andreas


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ