ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Follow publication

Daily bit(e) of C++ | Learn Modern C++ 4/N

Šimon Tóth
ITNEXT
Published in
14 min readApr 29, 2023

Daily bit(e) of C++ #118, A Modern-only C++ course (including C++23), part 4of N: Indirection

Welcome to the fourth lesson of the Learn Modern-only C++ course, which I’m running as a sub-series of my Daily bit(e) of C++.

In today’s lesson, we will take a deep dive into indirection. First, we will cover the language level facilities: pointers and references. Then, we will also review the most frequently used standard library types that utilize indirection: iterators, std::string_view and std::span.

If you missed the previous lesson, you can find it here:

Until now, we have been sticking with value semantics. However, you might have noticed something peculiar if you were paying attention in the previous lesson.

std::vector<int64_t> data; // empty

data.push_back(10);
// data == {10}

Open the example in Compiler Explorer.

This doesn’t quite fit in with value semantics. So how exactly is push_back able to modify the object “data”? The answer is indirection, and today, we will take a deep dive into the various ways you can achieve indirection in C++.

Pointers and member functions

C++ pointers serve as typed memory addresses. The key feature we get from pointers is indirection. By storing the memory address of a variable in a pointer, we can easily modify the value located at that specific address without direct access to the original variable.

int64_t x = 10;
// obtain address of variable x and store it in y
int64_t *y = &x;
// access x by dereferencing the address stored in y
*y = 20;
// x == 20

Open the example in Compiler Explorer.

Since a pointer itself is merely a value, it adheres to value semantics.

int64_t x = 10;
// obtain address of variable x and store it in y
int64_t *y = &x;
// copy the address stored in y to z
int64_t *z = y;
// access x by dereferencing the address stored in z
*z = 42;

Open the example in Compiler Explorer.

We typically group the pointer with the variable name, not the type. This might seem counterintuitive; however, it follows the C++ parsing rules. You might come across a piece of code like this: int *a, b; In this code, a is a pointer to an int; b is simply an int. This complexity is why it is preferable to declare each variable separately. That way, we avoid any confusion.

When working with compound types, the dereferencing syntax can become unwieldy. To alleviate this, C++ introduces a more concise shorthand.

struct X { 
int64_t x;
};

X x{0};

X *y = &x;
y->x = 42; // same as (*y).x = 42;
// x.x == 42

Open the example in Compiler Explorer.

Reference semantics

The indirection unlocked by pointers is also more appropriately referred to as reference semantics. With reference semantics, creating a copy creates another reference to the original state without impacting the original variable.

void change(int64_t *value) {
*value = 42;
}

int64_t x = 10;
int64_t *y = &x;
// passing y to change() creates another reference to x
change(y);
// x == 42

Open the example in Compiler Explorer.

Member functions

We can finally cycle back to our original question. How can the member function push_back modify the object it is invoked on?

In C++, member functions get access to a pointer to the object they were invoked on. This is a hidden first argument of a member function named “this”.

struct Holder {
int64_t value;

void set_value(int64_t v) {
this->value = v;
}

int get_value() const {
return this->value;
}
};


Holder v{42};
// v.get_value() == 42

v.set_value(3);
// v.get_value() == 3

Open example in Compiler Explorer.

If there isn’t a name collision, the name resolution will find the members even without the explicit “this→” prefix.

struct Point {
int64_t x;
int64_t y;
void demo(int x) {
// The argument x hides the member x

// set member x to the value of the argument x
this->x = x;
// the member y is visible, no need for "this"
y = x;
}
};

int main() {
Point p{10,15};
p.demo(1);
}

Open the example in Compiler Explorer.

Avoid shadowing existing names in your code because it significantly increases code complexity.

Immutability

Indirection is a powerful tool; however, we pay for that power with clarity. Once a function can access remote data, it becomes much harder to reason about its behaviour. One way to bring some semblance of order back into the mix is through immutability.

We have already talked about the tool to achieve immutability, the const keyword. In the context of pointers, we care about the referenced type being const. This effectively creates an immutable view of the original variable.

The standard semantics apply. If we have a pointer to a const type, we can only read from the referenced variable and cannot upgrade to a pointer to a mutable type. On the other hand, a pointer to a mutable type can be “downgraded” to a const type.

void read(const int64_t *p) {
int64_t a = *p; // OK
// Wouldn't compile, cannot mutate const variable
// *p = a;
}

void read_write(int64_t *p) {
int64_t a = *p; // OK
*p = a; // OK
}

int64_t x = 20;
read(&x); // OK, int* -> const int*
read_write(&x); // OK, passing int* to read_write()

constexpr int64_t y = 20;
read(&y); // OK, passing const int* to read()
// Wouldn't compile, cannot convert const int* -> int*
// read_write(&y);

Open the example in Compiler Explorer.

Note that, as with any variable, we can also mark the pointer itself as const (const int64_t * const x). However, as with regular variables, marking the variable itself as immutable has little use.

In the previous section, we talked about member functions, and the const qualifier did appear as a qualifier for a member function. Remember that the member functions come with a hidden argument called “this”, which refers to the object the function was invoked on. Because of the hidden nature, we cannot change the type of this pointer directly, so instead, we annotate the method.

struct Demo {
void read() const {}
void read_write() {}
};

Demo x;
x.read(); // OK
x.read_write(); // OK

constexpr Demo y;
y.read(); // OK
// Wouldn't compile, cannot convert const Demo* to Demo*
// y.read_write(); // Would not compile

Open the example in https://compiler-explorer.com/z/jETP5Taz8Compiler Explorer.

Nullptr

While we typically do not care about the actual value a pointer is storing, one unique value is very significant: the “nullptr”. This represents the situation where the pointer isn’t pointing to anything.

One of the typical use cases is to represent optional arguments or values:

void print(const int64_t *value) {
if (value == nullptr) {
std::cout << "empty\n";
} else {
std::cout << *value << "\n";
}
}

int64_t x = 20;

int64_t *y = nullptr;
print(y);
// prints: empty

y = &x;
print(y);
// prints: 20

Open the example in Compiler Explorer.

Safety, lifetime, ownership

An important aspect of reference types in C++ is that they are typically weak references, meaning they do not imply ownership over the original data.

This does prevent any overhead; however, as a consequence, it is entirely up to the developer to ensure that the reference type will not outlive its source variable.

#include <cstdint>

int main() {
int64_t *ptr = nullptr;
{
int64_t value = 10;
ptr = &value;
} // value is destroyed here
*ptr = 20; // BOOM
// accessing a variable after its lifetime
}
/* Address sanitizer output:
ERROR: AddressSanitizer: stack-use-after-scope
WRITE of size 8 at 0x7fb069a00020 thread T0
#0 0x401228 in main /app/example.cpp:9
*/

Open the example in Compiler Explorer.

Typically, in Modern C++, you won’t be using raw pointers; however, the above problem can be replicated with any weak reference type. Keep your lifetimes hierarchical.

The one case where we still rely on a raw pointer is passing a non-owning pointer to an immutable type to a function.

struct SomeType {};

void some_func(const SomeType *ptr) {
if (ptr == nullptr) return;
*ptr; // process the data in SomeType
}

Open the example in Compiler Explorer.

If you see (or write) this type of function, the intention is:

  1. it is the responsibility of the caller that the pointer is valid and pointing to an object that will stay alive during the function call (or nullptr)
  2. the function is only expected to read the state of the object
  3. the function should not store this pointer for later use (since that invalidates the contract in 1.)

References

We will revisit pointers later in the course. For now, we will stick with another built-in tool for reference semantics: references.

You might have noticed that using pointers is relatively heavy in syntax. For example, we must use the address-of operator to obtain an address. To access the original value, we must dereference and check for nullptr while at it.

A reference is internally a pointer. Unlike a pointer, a reference cannot be re-pointed and behaves as the referenced variable syntactically. This means we do not have to use either of the address-of/dereference operators.

int64_t x = 42;
int64_t &i = x; // reference to int64_t, referencing x
i = 20;
// i == x == 20

const int64_t &j = x; // immutable reference
// j == i == x == 20

Open the example in Compiler Explorer.

Because a reference behaves as the original variable, creating a reference to a reference creates a reference to the original variable (unlike with pointers).

int64_t x = 10;
int64_t &y = x;
int64_t &z = y;
z = 42;
// x == y == z == 42

Open the example in Compiler Explorer.

One of the use cases for references is operator overloading. We will go over this topic in detail later in the course. For now, let’s look at stream insertion and extraction. This is how you can enable stream input and output for custom types.

#include <iostream>
#include <iomanip>

struct Person {
int64_t id;
std::string name;
};

// Stream insertion operator
// - left argument ostream, we will mutate
// - right argument Person, we will only read
// - result, the ostream passed in as left argument
std::ostream& operator <<(std::ostream& out, const Person& p) {
// remember operator evaluation rules: left-to-right
// return (((out << p.id) << " ") << std::quoted(p.name))
// this is why we return the stream, so we can chain
return out << p.id << " " << std::quoted(p.name);
}

// Stream extraction operator
// - left argument istream, we will mutate
// - right argument Person, we will mutate
// - result, the istream passed in as left argument
std::istream& operator >>(std::istream& in, Person& p) {
return in >> p.id >> std::quoted(p.name);
}

int main() {
Person p1{1,"John Doe"};
Person p2{2,"Jane Doe"};

std::cout << p1 << "\n" << p2 << "\n";
// prints:
// 1 "John Doe"
// 2 "Jane Doe"

Person p3;
// For input: 3 "Taddeo Hilda"
std::cin >> p3;
std::cout << p3;
// prints:
// 3 "Taddeo Hilda"
}

Open the example in Compiler Explorer.

Safety

Because references are pointers (with better syntax), most of the same safety rules apply.

On top of that, we have one more case to consider. In Modern C++, a function would rarely return a raw pointer. However, the same cannot be said about references (we did it in the previous example).

When returning a reference, it is essential to take care of the lifetime of the object to which we are returning a reference. You should never return a reference to a local variable. The two valid cases are: pass-through, where we return one of the arguments we took by reference and getters on compound objects.

struct Compound {
int64_t x;
int64_t& get_x() { return x; }
};

int64_t& passthrough(int64_t& v) {
v += 1;
return v;
}


Compound c{5};
c.get_x() = 20;
// c.x == 20

int64_t p = 42;
passthrough(p) += 10;
// p == 53

Open the example in Compiler Explorer.

Revisiting the range-for-loop

References unlock the full potential of the range-for-loop. Not only can we avoid copying each element (which is wasteful), but if we use a mutable reference, we can modify the original elements of the container as we iterate over them.

std::vector<int64_t> data{1,2,3,4,5};

// Iterate over all elements of data using a mutable reference
for (int64_t &v : data) {
v *= 2;
}

// Iterate over all elements of data using an immutable reference
for (const int64_t &v : data) {
// iterate over {2, 4, 6, 8, 10}
}

Open the example in Compiler Explorer.

Iterators

Iterators are one of the most important abstractions in the standard library. So far, we have only talked about std::vector, std::array and std::string. However, these three containers are similar, storing their elements in a contiguous memory block.

However, the standard library also offers other containers, for example, std::list (a doubly-linked list). This creates a problem. How do you write code that works for both a std::vector and a std::list?

At a minimum, we want iteration over all elements to work the same, no matter the underlying storage structure. This is where iterators come in. They are grouped into the following categories based on their capabilities: operations that can be completed in constant time.

  • input and output iterators: only forward iteration, each element can be read once
    data streams: e.g. reading from a network socket
  • forward iterators: only forward iteration, each element can be read multiple times
    singly-linked lists: e.g. std::forward_list
  • bidirectional iterators: as above + backward iteration
    doubly-linked lists: std::list, std::map, std::set
  • random access iterators: as above + move by integer offset and calculate the distance between two iterators
    multi-array structures: std::deque
  • contiguous iterators: as above + the storage is contiguous
    arrays: e.g. std::vector

For now, we will remain on the user side of the code, and the above serves mainly as a reference of what operations you can expect to be available and fast when working with various data structures.

Importantly, each data structure provides access to begin and end iterators, representing the iterator to the first element and an iterator to one past the last element. This creates a half-open interval [begin, end).

std::vector<int64_t> data{1,2,3,4,5};
for (std::vector<int64_t>::iterator it = data.begin();
it != data.end(); ++it) {
// the provided interface behaves like a pointer
*it = 7;
// iterator provides mutable access
// cbegin() and cend() return const_iterator for immutable access
}
// data == {7, 7, 7, 7, 7}

// For an empty container begin() == end()
std::vector<int64_t> empty;

Open the example in Compiler Explorer.

As noted in the previous lesson, std::vector might need to reallocate its internal storage to grow its capacity. When it does, all iterators are invalidated.

struct Point {
int64_t x;
int64_t y;
};


std::vector<Point> data{{0,1}, {1,1}, {2,4}, {-2,0}};

std::vector<Point>::iterator i = data.begin();
std::vector<Point>::iterator j = i;
// i and j now both point to the same element
int64_t x = i->x; // pointer-like interface

// Careful with pushing data into a vector, while holding iterators
while (data.size() != data.capacity())
data.push_back(*i); // OK, we check for capacity

data.push_back(*i); // at capacity, needs reallocation to grow
x = i->x; // i is no longer valid

/*
AddressSanitizer: heap-use-after-free
READ of size 8 at 0x606000000020 thread T0
#0 0x401634 in main /app/example.cpp:23
*/

Open the example in Compiler Explorer.

auto

This creates an excellent point to sneak in the basics of auto. You might have noticed that using iterators without auto is a bit cumbersome. To get the iterator type, we must go through the parent container, e.g. std::vector<int>::iterator.

Auto circumvents that by providing type deduction. In simplest terms, we don’t see the actual type; instead, the type will be deduced from whatever the variable is initialized with.

std::vector<int64_t> data{1,2,3,4,5};
// same as: std::vector<int64_t>::iterator it
for (auto it = data.begin(); it != data.end(); ++it) {
}

// same as: int64_t v
for (auto v : data) {}
// same as: int64_t &v
for (auto &v : data) {}
// same as: const int64_t &v
for (const auto &v : data) {}

Open the example in Compiler Explorer.

std::string_view and std::span

Besides single-element reference types, the standard library also offers two types that refer to a sequence of elements.

The std::string_view is a reference type for strings and can be constructed from std::string and string literals.

#include <string>
#include <iostream>

void print(std::string_view message) {
std::cerr << message << "\n";
}

int main() {
std::string greeting = "Hello World!";
print(greeting); // std::string -> std::string_view
print("Bye, bye."); // string literals -> std::string_view

std::string_view ok;
std::string_view bad;
{
std::string str = "This is a bad idea.";
ok = "This is an OK idea.";
bad = str;
} // str goes out of scope, and is destroyed
print(ok);
print(bad); // BOOM
}
/* Address sanitizer:
ERROR: AddressSanitizer: heap-use-after-free
*/

Open the example in Compiler Explorer.

Taking a reference type of string literals is OK because they are global objects (the only literal type with this property). However, all previous rules about lifetimes still apply; holding a reference to an object outside of its lifetime is a problem.

The std::span is a similar reference type; however, unlike std::string_view, which is specialized for strings and offers appropriate manipulation options, std::span can reference any contiguous sequence of elements.

On top of that, std::span is a mutable reference type, meaning we can still modify the underlying data through a std::span (unlike std::string_view).

void fill(std::span<int64_t> arr, int64_t value) {
for (int64_t &v : arr)
v = value;
}

std::vector<int64_t> vec{1,2,3,4,5};
fill(vec, 10);
// vec == {10, 10, 10, 10, 10}

std::array<int64_t,3> arr{1,1,1};
fill(arr, 42);
// arr == {42, 42, 42}

Open the example in Compiler Explorer.

Commented example

In today’s commented example, we will simulate a feeding routine for a group of animals.

#include <vector>
#include <iomanip>
#include <iostream>

struct Animal {
std::string name;
int64_t feeding_period;
std::string food_type;
int64_t last_fed;

void feed(int64_t timestamp) {
std::cout << "Feeding " << std::quoted(name)
<< " some lovely "
<< std::quoted(food_type) << ".\n";
last_fed = timestamp;
}

int64_t next_feeding() const {
return last_fed + feeding_period;
}
};

// Stream insertion, simple space delimited format
std::ostream& operator<<(std::ostream& out, const Animal& animal) {
return out << std::quoted(animal.name) << " "
<< animal.feeding_period << " "
<< std::quoted(animal.food_type) << " "
<< animal.last_fed << "\n";
}

// Stream extraction, we can read space delimited format directly
std::istream& operator>>(std::istream& in, Animal& animal) {
return in >> std::quoted(animal.name)
>> animal.feeding_period
>> std::quoted(animal.food_type)
>> animal.last_fed;
}

int main() {
std::vector<Animal> animals;

// Read the animals from the standard input
Animal next;
// While we succeed in reading, repeat
// same as while ((std::cin >> next).good()) {}
while (std::cin >> next)
animals.push_back(next); // add to vector

// Iterate over time
for (int64_t time = 0; time < 30; ++time)
// For each animal check if it needs feeding
// we need a mutable reference for feed()
for (auto &animal : animals)
if (animal.next_feeding() == time)
animal.feed(time);

std::cout << "\n";

// Read-only iteration
// next_feeding() can operate on an immutable object
for (const auto &animal : animals)
std::cout << std::quoted(animal.name)
<< " will be next fed at time "
<< animal.next_feeding() << ".\n";
}
/* For input:
"Spot" 10 "raw salmon" 0
"Fluffington" 4 "carrot" 0

The output will be:
Feeding "Fluffington" some lovely "carrot".
Feeding "Fluffington" some lovely "carrot".
Feeding "Spot" some lovely "raw salmon".
Feeding "Fluffington" some lovely "carrot".
Feeding "Fluffington" some lovely "carrot".
Feeding "Spot" some lovely "raw salmon".
Feeding "Fluffington" some lovely "carrot".
Feeding "Fluffington" some lovely "carrot".
Feeding "Fluffington" some lovely "carrot".

"Spot" will be next fed at time 30.
"Fluffington" will be next fed at time 32.
*/

Open the example in Compiler Explorer.

Homework

The template repository with homework for this lesson is here: https://github.com/HappyCerberus/daily-bite-course-04.

As with all homework, you will need VSCode and Docker installed on your machine and follow the instructions from the first lesson.

The goal is to make all tests pass as described in the readme file.

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Šimon Tóth

20 years worth of Software Engineering experience distilled into easily digestible articles.

No responses yet

Write a response