IRead: 21st Century C++

Before we start

I use to read papers, books, posts, etc. related to Computer Science.
Sometimes I like to try (some of) the things described and try to keep a personal record of what I have found and done.
Now with this, kind of personal on-line record, I think it might be a good idea to make these notes digital (I mostly take my notes on paper).
So, this could be a series, I do not know.
Anyways, this would be the first entry.
Disclaimer: I tend to write the obvious a lot, be warned.

Now, we start

I saw a paper written by Bjarne Stroustrup at BLOG@CACM [1], which present recent advances on the design of C++ that should allow to write simpler software while maintaining backwards compatibility.
Stroustrup's paper is interesting because it gives us a glimpse into the current state of mind of the creator of C++: a language used in a plethora of domains which is often a synonym of speed, efficiency and of course difficulty.
Right from the start one of the first statements caught my eye right away:
(C++) evolved to meet challenges, but many developers use C++ as if it was sill the previous millennium. This is suboptimal from the perspective of ease of expressing ideas, performance, reliability, and maintainability.
My immediate first question was: "Am I one of them?" I don't know, probably.
But beyond the personal self-check, it is notable the impact on the perspectives Stroustrup refers to: ease of expressing ideas, performance, reliability and maintainability.
We write software to solve a problem. Solving a problem involves to be able to express our ideas in a suitable language. The language could be a natural language like English, a formal language like mathematics or a programming language. The "simple" fact of expressing our own ideas is a difficult problem in itself, let alone to express it in a formal language in such a way that other human beings are able to decode what is written in the language and reach the conclusion that we are solving the problem we are claiming we are solving.
Understanding what a piece of code does, is a prerequisite for maintaining said software. You can not maintain something you don't understand. And the easier complex ideas are expressed, the more probable it is the resulting code is maintainable. Hence, the first perspective (ease of expressing ideas) imply the last perspective (maintainability).
Software is not like a mechanical motor it does not tend to wear with time. If you have a motor running 24/7 then some of its parts will wear and will end up failing. That does not happen with software. Reliability comes in the sense of functional bugs absence. A piece of software does what it is meant to do nothing more and nothing less. It is clear that the easier ideas can be expressed, the easier they can be understood and so the easier is to spot errors. However, ease is not enough, the language must also be precise. If two (quite) different ideas are barely distinguishable in the language, then changes of bugs introduced by a misconception increases.
Performance is sometimes at odds with ease, and in this paper that dichotomy is clear. In Section 3.2 after one of the running examples have been reviewed to increase its performance a final performance optimization is not described because it involves only conventional low-level techniques and well, low-level is by definition not simple and not what this paper is all about. Now, let's dive into the examples!

The first example

Natural language is often used as a first specification of a code's purpose. In this paper that is not different and the first example is explained as follows:
Consider a simple program that writes every unique line from input to output
The program is the following:


	import std;             // make all of the standard library available
	using namespace std;
	int main()              // print unique lines from input
	{    unordered_map<string,int> m;  // hash table
	     for (string line; getline (cin;line); )
	         if (m[line]++ == 0)
                     cout<<line<<'\n';
	}

I typed this and compiled it with Clang19 on FreeBSD. It failed to compile the module for std. So I decided to try on Fedora 41 and got similar results. The standard C++23 is quite new, so it stands to reason that changing a core feature of the language like std should take time. I'm not looking to find out how I can build the std module so the example can be executed, I was hoping it would be a kind of "straight-forward" example. For more information on modules from Clang and GCC's perspectives you can consult [2] and [3].
Next step is to change the import line for the corresponding #include directives.
And sure enough the program compiled.
Stroustrup mentions some advantages of this code compared to older C++ syles like the absense of pointers and casts while maitaining efficiency: -it is- more efficient than most programmers could write given a reasonable amount of time.
However, since Stroustrup does not provide an example using older C++, I think it is difficult to see these advances if you are not savy of older C++.

Most of the time, software development begins with a natural language description or specification. In this case this description is: ... a ... program that writes every unique line from input to output.
A software developer takes this decription and crafts a program that fulfills the description. Once we have said program, the next natural question is: can the program be simpler?

There are many definitions for simplicity in software, at this moment I'm interested in: is there an equivalent program such that it uses a simpler (hopefully faster) main data structure?
Using a hash table surprised me a bit, because it is a map from a set of keys to a set of values. This is a significant machinery just to implement a check-list. The main question we must answer is: have I seen this string before?
I think a natural choice would be to use a set or an unordered_set (in C++ lingo).


	using namespace std;
	int main()             
	{    unordered_set<string> s;  // A set to hold the lines
	     for (string line; getline (cin;line); )
	         if (s.find(line)==s.end()){
                     cout<<line<<'\n';
                     s.insert(line);
                 }	  
	}

How does this version compares to the original?
Well, it is slower. I tested both 20 times each with the same 10M random strings of lenght at most 20 characters. The average (wall-clock) time of Stroustrup's solution is 17.136296 seconds.
On the other hand the average time of the version using sets is: 18.090042 seconds.
This is because in the second program we query the data structure at least two times, one to see if line is in the set and another when we insert the string; the second query is inherent to the insert operation. While in the first example, it is only queried once. So, we need to remove one query.


	using namespace std;
	int main()             
	{    unordered_set<string> s;      // A set to hold the lines
	     auto cs = s.size();   //The previous size of the set
	     for (string line; getline (cin;line); )
	         s.insert(line);
	         if (s.size() != cs){
                     cout<<line<<'\n';
                     cs++;
                 }	  
	}

In this version each line is inserted in the set, if the set doesn't contain line, then the set's size increases by one and that's how we detect unseen lines. We just need to keep track of the previous size (the cs variable). This version is faster than the first example with an average of 16.866531 seconds.
But we are still doing overwork because insert returns a tuple such that the second field is true if and only if an insertion was made so we can trim the code to something similar to Stroustrup's example:


	using namespace std;
	int main()             
	{    unordered_set<string> s;      // A set to hold the lines
	     for (string line; getline (cin;line); )
	         if (s.insert(line).second)
                     cout<<line<<'\n';
	}

This new version has an average of 16.659581 seconds.

Solution	Time (seconds)
Stroustrup	17.136296
Set_V1	18.090042
Set_V2	16.866531
Set_V3	16.659581

Overall, a fraction of a second or so improvement for 10M entries is well within measurement error or acceptable in my opinion.
However, I feel this last version is a conceptually simpler and closer to the natural language specification; and hence more resilient to client-coder misunderstandings.

The second example

The second example is similar to the Set_V3 solution in the previous section. It shows a function that keeps unique strings obteined from the standard input and returns a vector containing the strings.


	import std;
	using namespace std;
	vector<string> collect_lines(istream& is) // collect unique lines from input
	{
	    unordered_set s;
	    for (string line; getline (cin;line); )
	         s.insert(line);
            return vector{from_range, s};
	}
	auto lines = collect_lines(cin);

Unfortunately this example fails to compile with Clang 19 on FreeBSD because the compiler can not deduce the type of s. This is without counting the import std; failure.
If s is "fully" typed as unordered_set<string> then the program compiles just fine.
Later this example is rewritten to a simpler form:


	import std;
	using namespace std;
	vector<string> collect_lines(istream& is) // collect unique lines from input
	{
	    unordered_set s {from_range,istream_iterator<string>{is}};
            return vector{from_range, s};
	}
	auto lines = collect_lines(cin);

Once again, we need to "fully" type s but even if we do that, this example also fails to compile because to initialize an unordered_set with a range, the iterator must be "container-compatible-range" and when Clang makes this check, it fails.
Perhaps for istream_iterator to pass this test, it must be coaxed (casted) into a suitable type, but I do not know exactly as the paper clearly claims that istream_iterator<string>{is} allows us to treat the input from is as a range of elements ....
Then, for the sake of experimentation I will stick to the first version.

The return statement (return vector{from_range, s}) makes a copy of the elements of s and place that copy in the vector that is returned.
To avoid this copy, the paper suggests to change the return statement to:


	return vector{from_range,std::move(s)};

Which tells the compiler to move the elements from s to the vector instead of copying them.
The reason given for this change is (I)n principle, the compiler could deduce that we don't use s again after making the vector and just move the string elements, but today's compilers are not that smart... The explanation sounds striking similar to a live-variables analysis [4], on the other hand, if this is not possible, it might indicate there are some complex operations under-the-hood preventing the compiler to deduce to use a move constructor instead of a copy constructor.
We can quickly verify this and other behaviors with the following code:


	#include <utility>
        #include <iostream>
	#include <unordered_map>
	
        enum class Created {copy,moved,movedAssig,defc,construct};
        int globalTag;
        class TestClass {
        public:
            int ident;
            Created creation;
	    int tag;
            TestClass():ident(0),creation(Created::defc),tag(globalTag++){}
            TestClass(int i):ident(i),creation(Created::construct),tag(globalTag++){}
            TestClass(const TestClass& a):ident(a.ident),creation(Created::copy),
        				  tag(globalTag++){}
            TestClass& operator=(const TestClass& a){
        	auto tmp = TestClass(a);
        	std::swap(tmp,*this);
        	return *this;
            }
            TestClass(TestClass&& a):ident(std::move(a.ident)),creation(Created::moved),
        			     tag(globalTag++){}
            TestClass& operator=(TestClass&& a){
        	ident = std::move(a.ident);
        	creation = Created::movedAssig;
        	tag = std::move(a.tag);
        	return *this;
            }
            void inc(){ident++;}
	    inline bool  operator==(const TestClass& a)const{
	        return a.ident == ident;
            }
	    ~TestClass(){
        	std::cout << "Tag " << tag << " was released\n";
            }
          };
        struct Hash {
            size_t operator() (const TestClass& tc) const{
	        return tc.ident;
            }
        };

In particular we can test std::move behavior with unordered_map:


	int main(){
            std::unordered_set s;
            for(int idx = 1;idx != 11;idx++){
	        auto test = create_with_id(idx);
	        s.insert(test);
            }
            return 0;
        }

If we run this example, we get the output:

Tag 0 was released Tag 2 was released Tag 4 was released Tag 6 was released Tag 8 was released Tag 10 was released Tag 12 was released Tag 14 was released Tag 16 was released Tag 18 was released Tag 19 was released Tag 17 was released Tag 15 was released Tag 13 was released Tag 11 was released Tag 9 was released Tag 7 was released Tag 5 was released Tag 3 was released Tag 1 was released

It shows 20 objects were created, this is not surprising as create_with_id creates a new object, it is moved to the test variable in the for loop and then is copied again into the set s .
The even tags belong to the local test variable while the odd tags to the objects inside the set.
We can even see that the set seems to release its elements in backwards order.
The tag member keeps track of how many times the constructor was called. The number does not imply that 20 different memory "chunks" coexisted at the same time.
The likely scenario is that the (compiled) program reserves memory for the new object at create_with_id, when the function finishes, that memory is referenced by the test variable. Then another object is created by the call to s.insert and that object is stored in the set and only released when the set is destroyed. After the set is updated, the test variable is destroyed which makes its memory available to be used at the next iteration. Hence, in total only 11 memory "chunks" are used.
If we change s.insert(test) by:


	s.insert(std::move(test));

The output changes as follows:

Tag 0 was released Tag 1 was released Tag 2 was released Tag 3 was released Tag 4 was released Tag 5 was released Tag 6 was released Tag 7 was released Tag 8 was released Tag 9 was released Tag 9 was released Tag 8 was released Tag 7 was released Tag 6 was released Tag 5 was released Tag 4 was released Tag 3 was released Tag 2 was released Tag 1 was released Tag 0 was released

It shows that only 10 objects were created, but also gives a glimpse into what happens under-the-hood.
The destructor of each object is called twice, once at the end of the loop and another when releasing the set.
Despite this, data consistency is preserved, that is, it is not a double-free bug. A common bug when using "raw" pointers to keep track of resources.
That is cool right? The beauty of smart pointers.

In this particular example memory is not "saved" because the variable test is local to the for loop. In general if this pattern is followed for containers population or data distribution, memory is not likely to be "saved", however, it saves time if the construction of the objects is expensive.
This is an example in which a memory operation does not save memory due to compilation properties but does save time.
I modified the second example to use std::move when inserting the string into the set and test both versions with 50M random strings.
The version using std::move took on average 66.29019458293915 seconds while the version without took on average 67.51203953027725 seconds. Again, this is such a close difference that is well within measurement error.

Exceptions

Exceptions are a control-flow mechanism that allows to perform a "jump with data" [5].
The name might be a bit unfortunate because it suggests it is used under "special circumstances", however, those are constructs of the programmer there is no language construct to indicate a control-path is "typical".
This is a good place for a note. In some systems like the Linux kernel [6], or the Boost Library[7], constructs had been created to hint the compiler that some test is likely or unlikely to hold.
Historically, Exceptions had been used to perform error-handling. But if we do not pay attention to what the name implies in natural language, they can be used to increase the performance of a program.
Consider the following function that looks for a specific element in a matrix and replaces it for another value.
Later if the replacement took place the elements on the column of the replaced element are updated.


	template <typename T>
        bool update_data(vector<data_row<T>>& matrix, T query, T update)
        {
           int col = -1;
           for(auto& row : matrix)
           {
        	int c = -1;
        	for(auto d : row)
        	{
        	    c++;
        	    if (d == query)
        	    {
        		row[c] = update;
        		col = c;
        		break;
        	    }
        	}
        	if(col != -1)
        	    break;
           }
           if (col == -1)
        	return false;
           //Update the column
           for(auto& row: matrix)
           {
        	if (col < row.size()){
        	    row[col] += update;
        	    if (row[col] == 0)
        		row.erase(row.begin()+col);
        	}
           }
           return true;

The variable col is used to store the column to update, if any and as a form of "status" flag to control the iteration over the rows of the matrix. At the exit of the outermost for the value of col must be checked to see if a replacement took place or not and then act accordingly.
We can write a functionally equivalent function using Exceptions:


	class ElemIdx
        {
        public:
            int idx;
            ElemIdx(int i):idx(i){}
        };
        template <typename T>
        bool update_data(vector<data_row<T>>& matrix, T query, T update)
        {
            try{
        	for(auto& row : matrix)
        	{
        	    int c = -1;
        	    for(auto d : row)
        	    {
        		c++;
        		if (d == query)
        		{
        		    row[c] = update;
        		    throw ElemIdx{c};
        		}
        	    }
        	}
            }catch(ElemIdx col){
        	//Update the column
        	for(auto& row: matrix)
        	{
        	    if (col.idx < row.size()){
        		row[col.idx] += update;
        		if (row[col.idx] == 0)
        		    row.erase(row.begin()+col.idx);
        	    }
        	}
        	return true;
            }
            return false;
            
        }

In this version, once we find the element, we throw an exception. Since the code handling the exception finalizes the function, there is no need for an extra check, hence if no exception was thrown, we know no element was found. The language can throw primitive types as exceptions, however, wrapping them in a type makes the program a bit more safe.
How does these two versions compare?
I filled a 10000 x 10000 matrix of ints in the following way:


	for (size_t i = 0; i != dim; i++){
        	vector row;
        	for(size_t j = 0; j != dim; j++)
        	    row.push_back(j%(i+1)+1);
        	matrix.push_back(std::move(row));
        }

And then query and update for the number 50000 (s = 50000) with:


	for (size_t i = 0; i != rounds; i++){
	       if (!update_data(matrix,s,s/2))
	            s /= 2;
	}

And timed this last loop across 10 runs each with 50 rounds (rounds = 50).
The version without exceptions timed on average 20647.3 milliseconds while the version with exceptions timed on average 18288.3 milliseconds.
This (artificial) example shows that Exceptions are flexible enough to be used only as error-handling mechanisms.

Closing remarks

Software development is a fascinating discipline. There is always a dichotomy between adopting new features and staying in the "known".
The introduction of Stroustrup's paper illustrates this. On the one hand Stroustrup states the language's long history: ... leads many developers, teachers, and academics to overlook decades of progress and describe C++ as if today was still the second millennium ...
On the other hand, Stroustrup also states that: Stability - compatibility with earlier versions of C++ - is immensely important, especially for organizations that maintain software systems for decades.
Software is written by humans and we have preferences, beliefs, routines, and so on. It is also written in a language, that, like any other language invented by humans, it changes over time.
I think it is pretty amazing that my FreeBSD box can compile C++ code written in 1985, but I also think that such feature is, to use Stroustrup's terms, suboptimal.
Software is not like a mechanical motor it does not tend to wear with time. If you have a motor running 24/7 then some of its parts will wear and will end up failing. That does not happen with software.
However, the moment we write a piece of software we assume a responsibility of maintaining it, not only to ensure its functional correctness and its clarity but to keep it compliant with the latest version of our language of choice.
Failing to do the latest is like asking a High School Calculus student to be able to read Latin and understand Newton's original works.
If we change/upgrade our computers each five years or so, why don't we also upgrade/change our code-bases?
Until we construct the ultimate language coped with the ultimate compiler, we are stuck at re-code or perish, but if natural languages are any indication, then, after thousands of years our natural languages keep evolving.

References

[1] Stroustrup, Bjarne. "21st Century C++", February 4, 2025. https://cacm.acm.org/blogcacm/21st-century-c/ (last accessed: February 16, 2025)
[2] Standard C++ Modules https://clang.llvm.org/docs/StandardCPlusPlusModules.html (last accessed: February 16, 2025)
[3] C++ Modules https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Modules.html (last accessed: February 16, 2025)
[4] Nielson, F., et al., "Principles of Program Analysis", Springer, 2nd printing, 2005.
[5] Mitchel,J.,"Concepts in Programming Languages", Cambridge University Press,2003.
[6] likely() and unlikely() https://kernelnewbies.org/FAQ/LikelyUnlikely (last accessed: February 23, 2025)
[7] Boost Macro Reference https://www.boost.org/doc/libs/1_87_0/libs/config/doc/html/boost_config/boost_macro_reference.html (last accessed: February 23, 2025)