Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

std::string is a zero-cost abstraction joined at the hip to a non-zero-cost abstraction. At heart, it’s an abstraction over the pattern of having a uniquely-owned string pointer which is freed when the owner (object or stack frame) is freed. If you use it when you would otherwise use that pattern, the overhead should be effectively zero, at least if your STL implementation is high-quality.

However, as you note, the API encourages you to use that ownership pattern everywhere, in a way you likely wouldn’t if you were doing allocation by hand – and often by accident. As the worst offender, the copy constructor allocates a new buffer and copies the whole string, though it’s implicitly called whenever you simply assign one std::string variable to another, or pass a std::string to a std::string-typed function parameter. It’s hard to avoid copies even if you know what you’re doing, and easy to misunderstand the costs if you don’t. In contrast, if you were doing allocation by hand, most of the time you’d try to just pass around pointers to existing buffers, with various ownership semantics: temporary access (borrowed), transferring ownership (move), or even using reference counting.

- For the unowned case, you could always use `const std::string &`, but that only works when referencing a full existing heap allocation; it can’t represent substrings of heap allocations, or data that wasn’t on the heap in the first place. C++17 improved the situation by adding std::string_view, but you have to know to use it, especially since the API design is limited by backwards compatibility constraints: for example, std::string’s substr method still returns a std::string, not a string_view. And some people say that in ‘modern C++’ you should avoid raw unowned types to ensure safety, so if you follow their advice you would have to avoid string_view as well.

- For the transferring-ownership case, C++11 added essentially zero-cost move construction that just moves the pointer from one std::string to another. But again, you have to know to use it, and the default is an expensive copy.

- For reference counting, you could use std::shared_ptr, but it’s suboptimal (always using atomic reference counting even if you don’t need it, and doing a double heap indirection).

Rust’s standard library has a type with similar semantics, String. But with the benefit of hindsight, it improves upon std::string in each of those areas:

- For the unowned case, there’s &str, which is like C++ string_view, and it’s idiomatic to use &str everywhere instead of &String (the equivalent of C++ `const std::string &`). Slicing a String (i.e. the equivalent of substr) gives you &str; if you want to copy that into its own owned buffer, you have to do it explicitly via `.to_string()`. Also, Rust’s borrow checker verifies that your use of unowned pointers is safe, so there’s no safety/performance tradeoff.

- Regarding the transferring-ownership case, Rust makes move semantics the default. Assigning one String-typed variable to another, or passing a String to a String-typed function parameter, will do a move; if you try to use the source afterwards, or if you don’t uniquely own the source, it will complain at compile time (there are workarounds depending on what you want to do). If you want to copy the data to a new heap allocation, you have to explicitly write `.clone()`.

- For reference counting… well, it’s arguably less than optimal in Rust too, but better than C++. For example, the language provides atomic and non-atomic reference-counting wrapper. You can use the faster non-atomic version if you don’t need to share the value across threads (and the compiler prevents you from doing so by accident). Also, you can avoid the double indirection by using Rc<str> instead of Rc<String>.

- Oh, and there is also Cow, the copy-on-write type, which represents a pointer that may or may not be owned. I don’t think the C++ STL has any equivalent; some STL implementations used to make all std::strings copy-on-write, before C++11 made that illegal, but that adds overhead for all users rather than just those who need it.

Thus, I’d say Rust’s String has a better claim to being a true zero-cost abstraction.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: