[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250221220201.7068dfa3@pumpkin>
Date: Fri, 21 Feb 2025 22:02:01 +0000
From: David Laight <david.laight.linux@...il.com>
To: Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc: Jan Engelhardt <ej@...i.de>, "H. Peter Anvin" <hpa@...or.com>, Greg KH
<gregkh@...uxfoundation.org>, Boqun Feng <boqun.feng@...il.com>, Miguel
Ojeda <miguel.ojeda.sandonis@...il.com>, Christoph Hellwig
<hch@...radead.org>, rust-for-linux <rust-for-linux@...r.kernel.org>, Linus
Torvalds <torvalds@...ux-foundation.org>, David Airlie <airlied@...il.com>,
linux-kernel@...r.kernel.org, ksummit@...ts.linux.dev
Subject: Re: C aggregate passing (Rust kernel policy)
On Fri, 21 Feb 2025 22:23:32 +0200
Laurent Pinchart <laurent.pinchart@...asonboard.com> wrote:
> On Fri, Feb 21, 2025 at 09:06:14PM +0100, Jan Engelhardt wrote:
> > On Friday 2025-02-21 19:34, David Laight wrote:
> > >>
> > >> Returning aggregates in C++ is often implemented with a secret extra
> > >> pointer argument passed to the function. The C backend does not
> > >> perform that kind of transformation automatically. I surmise ABI reasons.
> > >
> > > Have you really looked at the generated code?
> > > For anything non-trivial if gets truly horrid.
> > >
> > > To pass a class by value the compiler has to call the C++ copy-operator to
> > > generate a deep copy prior to the call, and then call the destructor after
> > > the function returns - compare against passing a pointer to an existing
> > > item (and not letting it be written to).
> >
> > And that is why people generally don't pass aggregates by value,
> > irrespective of the programming language.
>
> It's actually sometimes more efficient to pass aggregates by value.
> Considering std::string for instance,
>
> std::string global;
>
> void setSomething(std::string s)
> {
> global = std::move(s);
> }
>
> void foo(int x)
> {
> std::string s = std::to_string(x);
>
> setSomething(std::move(s));
> }
>
> Passing by value is the most efficient option. The backing storage for
> the string is allocated once in foo(). If you instead did
>
> std::string global;
>
> void setSomething(const std::string &s)
> {
> global = s;
> }
>
> void foo(int x)
> {
> std::string s = std::to_string(x);
>
> setSomething(s);
> }
>
> then the data would have to be copied when assigned global.
>
> The std::string object itself needs to be copied in the first case of
> course, but that doesn't require heap allocation.
It is still a copy though.
And there is nothing to stop (I think even std::string) using ref-counted
buffers for large malloc()ed strings.
And, even without it, you just need access to the operator that 'moves'
the actual char data from one std::string to another.
Since that is all you are relying on.
You can then pass the std::string themselves by reference.
Although I can't remember if you can assign different allocators to
different std::string - I'm not really a C++ expert.
> The best solution
> depends on the type of aggregates you need to pass. It's one of the
> reasons string handling is messy in C++, due to the need to interoperate
> with zero-terminated strings, the optimal API convention depends on the
> expected usage pattern in both callers and callees. std::string_view is
> no silver bullet :-(
The only thing the zero-termination stops is generating sub-strings by
reference.
The bigger problem is that a C function is allowed to advance a pointer
along the array. So str.c_str() is just &str[0].
That stops any form of fragmented strings - which might be useful for
large ones, even though the cost of the accesses may well balloon.
The same is true for std::vector - it has to be implemented using realloc().
So lots of pushback() of non-trival classes gets very, very slow.
and it is what people tend to write.
David
Powered by blists - more mailing lists