StringView

A StringView references an immutable range of memory that is generally intended (but not required) to contain UTF-8-encoded text. It consists of a pointer bytes and an integer numBytes. Many StringView functions also work with text in any 8-bit format compatible with ASCII, such as ISO 8859-1 or Windows-1252. A StringView does not own the memory it points to, and no heap memory is freed when the StringView is destroyed.

The StringView class is similar to ConstBufferView except that it provides member functions suitable for string manipulation, such as trim(), split() and concatenation using the + operator.

StringView objects are not implicitly convertible to or from ConstBufferView, but they can be explicitly converted using bufferView() or fromBufferView().

StringView objects are not guaranteed to be null-terminated. If you need a null-terminated string (for example, to pass to a third-party library), you must construct a string that includes the null terminator byte yourself. The null terminator then counts towards the total number of bytes in numBytes. A convenience function withNullTerminator() is provided for this.

Strictly speaking, StringView objects are not required to contain text, though many member functions expect it. Internally, a StringView is simply a reference to an immutable sequence of bytes. The main reason to prefer a variable of type StringView over ConstBufferView is to express the intention for the memory range to contain text and/or for the convenience of having the StringView member functions available.

Several StringView functions works directly with byte offsets, such as subStr(), left() and right(). Be aware that when a string is encoded in UTF-8, byte offsets are not necessarily the same as the number of characters (Unicode points) encoded by the string. If the string contains multibyte characters, it is the caller's responsibility to pass byte offsets that begin and end at character boundaries.

For more information, see Unicode Support.

[FIXME: Mention caller is responsible for lifetime of the underlying data.]

Header File

#include <ply-runtime/string/StringView.h>

Also included from <ply-runtime/Base.h>.

Data Members

const char* bytes [code]

The first byte in the immutable memory range.

u32 numBytes [code]

The number of bytes in the immutable memory range.

Member Functions

StringView() [code]

Constructs an empty StringView.

StringView(const char* s) [code]

Constructs a StringView from a null-terminated string. The string memory is expected to remain valid for the lifetime of the StringView. Note that the null terminator character does not count towards numBytes. For example, StringView{"hello"} results in numBytes equal to 5.

When the argument is a C-style string literal, compilers are able to compute numBytes at compile time if optimization is enabled.

If you have a C-style string literal, and you want the null terminator character included in the StringView, use StringView::fromBufferView("hello") instead. In that case, the null terminator character counts towards the number of bytes in numBytes, making numBytes equal to 6 in this example.

StringView(const char& c) [code]

Constructs a StringView from a single byte. c is expected to remain valid for the lifetime of the StringView.

StringView(const char* bytes, u32 numBytes) [code]

Constructs a StringView explicitly from the arguments. The string memory is expected to remain valid for the lifetime of the StringView.

ConstBufferView& bufferView() [code]
const ConstBufferView& bufferView() const [code]

Explicitly convert the StringView to a ConstBufferView.

static const StringView& fromBufferView(const ConstBufferView& binView) [code]

Explicitly convert a ConstBufferView to a StringView.

static StringView fromRange(const char* startByte, const char* endByte) [code]

Returns a StringView referencing an immutable range of memory between two pointers. The number of bytes in the memory range is given by endByte - startByte, and endByte is considered a pointer to the first byte after the memory range.

const char& operator[](u32 index) const [code]

Subscript operator with runtime bounds checking.

const char& back(s32 ofs = -1) const [code]

Reverse subscript operator with runtime bound checking. Expects a negative index. -1 returns the last byte of the given string; -2 returns the second-last byte, etc.

const char* end() const [code]

Returns bytes + numBytes. This pointer is considered to point to the first byte after the memory range.

void offsetHead(u32 numBytes) [code]

Moves bytes forward and subtracts the given number of bytes from numBytes.

template <typename T>
T to(const T& defaultValue = subst::createDefault<T>()) const [code]

Parse the given string directly as Type. Whitepsace is trimmed from the beginning and end of the string before parsing occurs. If the string cannot be parsed, or if the string is not completely consumed by the parse operation, defaultValue is returned.

StringView{"123"}.to<u32>();    // returns 123
StringView{" 123 "}.to<u32>();  // returns 123
StringView{"abc"}.to<u32>();    // returns 0
StringView{"123a"}.to<u32>();   // returns 0
StringView{""}.to<u32>();       // returns 0
StringView{""}.to<s32>(-1);     // returns -1

This function uses StringViewReader internally. If you need to distinguish between a successful parse and an unsuccessful one, create and use a StringViewReader object directly.

explicit operator bool() const [code]

Explicit conversion to bool. Returns true if the length of the string is greater than 0. Allows you to use a String object inside an if condition.

if (str) {
    ...
}
bool isEmpty() const [code]

Returns true if the length of the string is 0.

StringView subStr(u32 start) const [code]
StringView subStr(u32 start, u32 numBytes) const [code]

Returns a substring that starts at the offset given by start. The optional numBytes argument determines the length of the substring in bytes. If numBytes is not specified, the substring continues to the end of the string.

StringView left(u32 numBytes) const [code]

Returns a substring that contains only the first numBytes bytes of the string.

StringView shortenedBy(u32 numBytes) const [code]

Returns a substring with the last numBytes bytes of the string omitted.

Returns a substring that contains only the last numBytes bytes of the string.

friend s32 compare(StringView str0, StringView str1) [code]

Returns:

  • -1 if str0 precedes str1 in sorted order
  • 0 if the strings are equal
  • 1 if str0 follows str1 in sorted order

Strings are sorted by comparing the unsigned value of each byte. If one of the strings contains the other as a prefix, the shorter string comes first in sorted order.

bool operator<(StringView other) const [code]

Returns true if the string precedes other in sorted order. Equivalent to compare(*this, other) < 0.

bool operator==(StringView src) const [code]
bool operator!=(StringView src) const [code]

Returns true if the string contents are identical (or not identical) when compared byte-for-byte.

String operator+(StringView other) const [code]

Returns a new String containing the concatenation of two StringViews.

String operator*(u32 count) const [code]

Returns a new String containing the contents of the StringView repeated count times.

StringView{'*'};    // returns "**********"
s32 findByte(char matchByte, u32 startPos = 0) const [code]

Returns the offset of the first occurence of matchByte in the string, or -1 if not found. The search begins at the offset specified by startPos. This function can find ASCII codes in UTF-8 encoded strings, since ASCII codes are always encoded as a single byte in UTF-8.

template <typename MatchFunc>
s32 findByte(const MatchFunc& matchFunc, u32 startPos = 0) const [code]

A template function that returns the offset of the first byte for which matchFunc returns true, or -1 if none. The search begins at the offset specified by startPos. This function can be used to find a whitespace character by calling findByte(isWhite).

s32 rfindByte(char matchByte, u32 startPos) const [code]
template <typename MatchFunc>
s32 rfindByte(const MatchFunc& matchFunc, u32 startPos) const [code]

Reverse findByte functions. Returns the offset of the last byte in the string that matches the first argument (if passed a char) or for which the first argument returns true (if passed a function). The optional startPos argument specifies an offset at which to begin the search.

bool startsWith(StringView arg) const [code]

Returns true if the string starts with arg.

bool endsWith(StringView arg) const [code]

Returns true if the string ends with arg.

StringView trim(bool* matchFunc(char), bool left = true, bool right = true) const [code]
StringView ltrim(bool* matchFunc(char)) const [code]
StringView rtrim(bool* matchFunc(char)) const [code]

Returns a substring with leading and/or trailing bytes removed. Bytes are removed if true is returned when passed to matchFunc. These functions can be used to trim whitespace characters from a UTF-8 string, for example by calling trim(isWhite), since whitespace characters are each encoded as a single byte in UTF-8.

ltrim() trims leading bytes only, rtrim() trims trailing bytes only, and trim() trims both leading and trailing bytes.

String join(ArrayView<const StringView> comps) const [code]

Returns a new String containing every item in comps concatenated together, with the given string used as a separator. For example:

StringView{", "}.join({"a", "b", "c"}); // returns "a, b, c"
StringView{""}.join({"a", "b", "c"});   // returns "abc"
Array<StringView> splitByte(char sep) const [code]

Returns a list of the words in the given string using sep as a delimiter byte.

String upperAsc() const [code]

Returns a new String with all lowercase ASCII characters converted to uppercase. This function works with UTF-8 strings. Also works with any 8-bit text encoding compatible with ASCII.

String lowerAsc() const [code]

Returns a new String with all uppercase ASCII characters converted to lowercase. This function works with UTF-8 strings. Also works with any 8-bit text encoding compatible with ASCII.

String reversedBytes() const [code]

Returns a new String with the bytes reversed. This function is really only suitable when you know that all characters contained in the string are encoded in a single byte.

String reversedUTF8() const [code]

Returns a new String with UTF-8 characters reversed. For example, StringView{"&#x1f60b;&#x1f37a;&#x1f355;"}.reversedUTF8() returns "&#x1f355;&#x1f37a;&#x1f60b;".

String filterBytes(char* filterFunc(char)) const [code]

Returns a new String with each byte passed through the provided filterFunc. It's safe to call this function on UTF-8 encoded strings as long as filterFunc leaves byte values greater than or equal to 128 unchanged. Therefore, this function is mainly suitable for filtering ASCII codes.

bool includesNullTerminator() const [code]

Returns true if the last byte in the string is a zero byte.

HybridString withNullTerminator() const [code]

If the last byte of the given string is not a zero byte, this function allocates memory for a new string, copies the contents of the given string to it, appends a zero byte and returns the new string. In that case, the new string's numBytes will be one greater than the numBytes of the original string. If the last byte of the given string is already a zero byte, a view of the given string is returned and no new memory is allocated.

StringView withoutNullTerminator() const [code]

If the last byte of the given string is not a zero byte, returns a view of the given string. If the last byte of the given string is a zero byte, returns a substring with the last byte omitted.

template <typename Hasher>
void appendTo(Hasher& hasher) const [code]

Feeds the contents of the given string to a hash function.