I use to read papers, books, posts, etc. related to Computer Science.
Sometimes I like to try (some of) the things described and try to keep a
personal record of what I have found and done.
Now with this, kind of personal on-line record, I think it might be a good
idea to make these notes digital (I mostly take my notes on paper).
So, this could be a series, I do not know.
Anyways, this would be the first entry.
Disclaimer: I tend to write the obvious a lot, be warned.
I saw a paper written by Bjarne Stroustrup at BLOG@CACM [1],
which present recent advances on the design of C++ that should allow
to write simpler software while maintaining backwards compatibility.
Stroustrup's paper is interesting because it gives us a glimpse into the
current state of mind of the creator of C++: a language used in a plethora
of domains which is often a synonym of speed, efficiency and of course
difficulty.
Right from the start one of the first statements caught my eye right away:
(C++) evolved to meet challenges, but many developers
use C++ as if it was sill the previous millennium. This is suboptimal
from the perspective of ease of expressing ideas, performance,
reliability, and maintainability.
My immediate first question was: "Am I one of them?" I don't know, probably.
But beyond the personal self-check, it is notable the impact on the
perspectives Stroustrup refers to: ease of expressing ideas, performance,
reliability and maintainability.
We write software to solve a problem.
Solving a problem involves to be able to express our ideas in a suitable
language.
The language could be a natural language like English, a formal language
like mathematics or a programming language.
The "simple" fact of expressing our own ideas is a difficult
problem in itself, let alone to express it in a formal language in such a
way that other human beings are able to decode what is written in the
language and reach the conclusion that we are solving the problem we are
claiming we are solving.
Understanding what a piece of code does, is a prerequisite for maintaining
said software.
You can not maintain something you don't understand.
And the easier complex ideas are expressed, the more probable it is
the resulting code is maintainable.
Hence, the first perspective (ease of expressing ideas) imply the last
perspective (maintainability).
Software is not like a mechanical motor it does not tend to wear with time.
If you have a motor running 24/7 then some of its parts will wear and will
end up failing.
That does not happen with software.
Reliability comes in the sense of functional bugs absence.
A piece of software does what it is meant to do nothing more and nothing less.
It is clear that the easier ideas can be expressed, the easier they can be
understood and so the easier is to spot errors.
However, ease is not enough, the language must also be precise.
If two (quite) different ideas are barely distinguishable in the language,
then changes of bugs introduced by a misconception increases.
Performance is sometimes at odds with ease, and in this paper that dichotomy
is clear.
In Section 3.2 after one of the running examples have been reviewed to
increase its performance a final performance optimization is not described
because it involves only conventional low-level
techniques and
well, low-level is by definition not simple and not what this paper is
all about.
Now, let's dive into the examples!
Natural language is often used as a first specification of a code's purpose.
In this paper that is not different and the first
example is explained as follows:
Consider a simple program that writes every unique line
from input to output
The program is the following:
import std; // make all of the standard library available using namespace std; int main() // print unique lines from input { unordered_map<string,int> m; // hash table for (string line; getline (cin;line); ) if (m[line]++ == 0) cout<<line<<'\n'; }
I typed this and compiled it with Clang19 on FreeBSD. It failed to compile
the module for std.
So I decided to try on Fedora 41 and got similar results.
The standard C++23 is quite new, so it stands to reason that changing a
core feature of the language like std should take time.
I'm not looking to find out how I can build the std
module so the example can be executed, I was hoping it would be a kind of
"straight-forward" example.
For more information on modules from Clang and GCC's perspectives you can
consult [2] and [3].
Next step is to change the import line for the
corresponding #include directives.
And sure enough the program compiled.
Stroustrup mentions some advantages of this code compared to older C++
syles like the absense of pointers and casts while maitaining efficiency:
-it is- more efficient than most programmers could write
given a reasonable amount of time.
However, since Stroustrup does not provide an example using older C++,
I think it is difficult to see these advances if you are not
savy of older C++.
Most of the time, software development begins with a natural language
description or specification.
In this case this description is: ... a ... program that
writes every unique line from input to output.
A software developer takes this decription and crafts a program that fulfills
the description.
Once we have said program, the next natural question is:
can the program be simpler?
There are many definitions for simplicity in software, at this moment I'm
interested in: is there an equivalent program such that it uses a simpler
(hopefully faster) main data structure?
Using a hash table surprised me a bit, because it is a map from a set of
keys to a set of values.
This is a significant machinery just to implement a check-list.
The main question we must answer is: have I seen this
string before?
I think a natural choice would be to use a set or an
unordered_set (in C++ lingo).
using namespace std; int main() { unordered_set<string> s; // A set to hold the lines for (string line; getline (cin;line); ) if (s.find(line)==s.end()){ cout<<line<<'\n'; s.insert(line); } }
How does this version compares to the original?
Well, it is slower. I tested both 20 times each with the same 10M random strings
of lenght at most 20 characters.
The average (wall-clock) time of Stroustrup's solution is 17.136296 seconds.
On the other hand the average time of the version using sets is:
18.090042 seconds.
This is because in the second program we query the data structure at least
two times, one to see if line is in the set and
another when we insert the string; the second
query is inherent to the insert
operation.
While in the first example, it is only queried once.
So, we need to remove one query.
using namespace std; int main() { unordered_set<string> s; // A set to hold the lines auto cs = s.size(); //The previous size of the set for (string line; getline (cin;line); ) s.insert(line); if (s.size() != cs){ cout<<line<<'\n'; cs++; } }
In this version each line is inserted in the set,
if the set doesn't contain line, then the set's
size increases by one and that's how we detect unseen lines.
We just need to keep track of the previous size (the cs
variable).
This version is faster than the first example with an average of
16.866531 seconds.
But we are still doing overwork because insert returns a tuple such that
the second field is true if and only if an
insertion was made so we can trim the code to something similar to
Stroustrup's example:
using namespace std; int main() { unordered_set<string> s; // A set to hold the lines for (string line; getline (cin;line); ) if (s.insert(line).second) cout<<line<<'\n'; }
This new version has an average of 16.659581 seconds.
Solution | Time (seconds) |
---|---|
Stroustrup | 17.136296 |
Set_V1 | 18.090042 |
Set_V2 | 16.866531 |
Set_V3 | 16.659581 |
Overall, a fraction of a second or so improvement for 10M entries is well within
measurement error or acceptable in my opinion.
However, I feel this last version is a conceptually simpler and closer to
the natural language specification; and hence more resilient to client-coder
misunderstandings.
The second example is similar to the Set_V3 solution in the previous section. It shows a function that keeps unique strings obteined from the standard input and returns a vector containing the strings.
import std; using namespace std; vector<string> collect_lines(istream& is) // collect unique lines from input { unordered_set s; for (string line; getline (cin;line); ) s.insert(line); return vector{from_range, s}; } auto lines = collect_lines(cin);
Unfortunately this example fails to compile with Clang 19 on FreeBSD because
the compiler can not deduce the type of s.
This is without counting the import std; failure.
If s is "fully" typed as
unordered_set<string> then the program compiles just fine.
Later this example is rewritten to a simpler form:
import std; using namespace std; vector<string> collect_lines(istream& is) // collect unique lines from input { unordered_set s {from_range,istream_iterator<string>{is}}; return vector{from_range, s}; } auto lines = collect_lines(cin);
Once again, we need to "fully" type s but even if
we do that, this example also fails to compile because to initialize an
unordered_set with a range, the iterator must be
"container-compatible-range" and when Clang makes this check, it fails.
Perhaps for istream_iterator to pass this test,
it must be coaxed (casted) into a suitable type, but I do not know exactly
as the paper clearly claims that
istream_iterator<string>{is} allows us to treat the input from
is as a range of elements ....
Then, for the sake of experimentation I will stick to the first version.
The return statement (return vector{from_range, s})
makes a copy of the elements of s and place that
copy in the vector that is returned.
To avoid this copy, the paper suggests to change the return statement to:
return vector{from_range,std::move(s)};
Which tells the compiler to move the elements from s
to the vector instead of copying them.
The reason given for this change is (I)n principle, the
compiler could deduce that we don't use s again after making the vector
and just move the string elements, but today's compilers are not that
smart...
The explanation sounds striking similar to a live-variables analysis [4],
on the other hand, if this is not possible, it might indicate there are
some complex operations under-the-hood preventing the compiler to deduce
to use a move constructor instead of a copy constructor.
We can quickly verify this and other behaviors with the following code:
#include <utility> #include <iostream> #include <unordered_map> enum class Created {copy,moved,movedAssig,defc,construct}; int globalTag; class TestClass { public: int ident; Created creation; int tag; TestClass():ident(0),creation(Created::defc),tag(globalTag++){} TestClass(int i):ident(i),creation(Created::construct),tag(globalTag++){} TestClass(const TestClass& a):ident(a.ident),creation(Created::copy), tag(globalTag++){} TestClass& operator=(const TestClass& a){ auto tmp = TestClass(a); std::swap(tmp,*this); return *this; } TestClass(TestClass&& a):ident(std::move(a.ident)),creation(Created::moved), tag(globalTag++){} TestClass& operator=(TestClass&& a){ ident = std::move(a.ident); creation = Created::movedAssig; tag = std::move(a.tag); return *this; } void inc(){ident++;} inline bool operator==(const TestClass& a)const{ return a.ident == ident; } ~TestClass(){ std::cout << "Tag " << tag << " was released\n"; } }; struct Hash { size_t operator() (const TestClass& tc) const{ return tc.ident; } };
In particular we can test std::move behavior with unordered_map:
int main(){ std::unordered_set
s; for(int idx = 1;idx != 11;idx++){ auto test = create_with_id(idx); s.insert(test); } return 0; }
If we run this example, we get the output:
Tag 0 was released Tag 2 was released Tag 4 was released Tag 6 was released Tag 8 was released Tag 10 was released Tag 12 was released Tag 14 was released Tag 16 was released Tag 18 was released Tag 19 was released Tag 17 was released Tag 15 was released Tag 13 was released Tag 11 was released Tag 9 was released Tag 7 was released Tag 5 was released Tag 3 was released Tag 1 was released
It shows 20 objects were created, this is not surprising as
create_with_id creates a new object, it is moved
to the test variable in the for
loop and then is copied again into the set s
.
The even tags belong to the local test variable
while the odd tags to the objects inside the set.
We can even see that the set seems to release its elements in backwards
order.
The tag member keeps track of how many times the
constructor was called.
The number does not imply that 20 different memory "chunks" coexisted
at the same time.
The likely scenario is that the (compiled) program reserves memory for
the new object at create_with_id, when the
function finishes, that memory is referenced by the test
variable.
Then another object is created by the call to s.insert
and that object
is stored in the set and only released when the set is destroyed.
After the set is updated, the test variable is
destroyed which makes its memory available to be used at the next iteration.
Hence, in total only 11 memory "chunks" are used.
If we change s.insert(test) by:
s.insert(std::move(test));
The output changes as follows:
Tag 0 was released Tag 1 was released Tag 2 was released Tag 3 was released Tag 4 was released Tag 5 was released Tag 6 was released Tag 7 was released Tag 8 was released Tag 9 was released Tag 9 was released Tag 8 was released Tag 7 was released Tag 6 was released Tag 5 was released Tag 4 was released Tag 3 was released Tag 2 was released Tag 1 was released Tag 0 was released
It shows that only 10 objects were created, but also gives a glimpse into
what happens under-the-hood.
The destructor of each object is called twice, once at the end of the loop
and another when releasing the set.
Despite this, data consistency is preserved, that is, it is not a double-free
bug. A common bug when using "raw" pointers to keep track of resources.
That is cool right? The beauty of smart pointers.
In this particular example memory is not "saved" because the variable
test is local to the for
loop.
In general if this pattern is followed for containers population or
data distribution, memory is not likely to be "saved", however, it saves
time if the construction of the objects is expensive.
This is an example in which a memory operation does not save memory due to
compilation properties but does save time.
I modified the second example to use std::move
when inserting the string into the set and test both versions with 50M
random strings.
The version using std::move took on average
66.29019458293915 seconds while the version without took on average
67.51203953027725 seconds.
Again, this is such a close difference that is well within measurement error.
Exceptions are a control-flow mechanism that allows to perform a "jump with
data" [5].
The name might be a bit unfortunate because it suggests it is used under
"special circumstances", however, those are constructs of the programmer
there is no language construct to indicate a control-path is "typical".
This is a good place for a note.
In some systems like the Linux kernel [6], or the Boost Library[7], constructs
had been created to hint the compiler that some test is likely or unlikely
to hold.
Historically, Exceptions had been used to perform error-handling. But if we
do not pay attention to what the name implies in natural language, they can
be used to increase the performance of a program.
Consider the following function that looks for a specific element in a
matrix and replaces it for another value.
Later if the replacement took place the elements on the column of the
replaced element are updated.
template <typename T> bool update_data(vector<data_row<T>>& matrix, T query, T update) { int col = -1; for(auto& row : matrix) { int c = -1; for(auto d : row) { c++; if (d == query) { row[c] = update; col = c; break; } } if(col != -1) break; } if (col == -1) return false; //Update the column for(auto& row: matrix) { if (col < row.size()){ row[col] += update; if (row[col] == 0) row.erase(row.begin()+col); } } return true;
The variable col is used to store the column to
update, if any and as a form of "status" flag to control the iteration over
the rows of the matrix. At the exit of the outermost for
the value of col must be checked to see if
a replacement took place or not and then act accordingly.
We can write a functionally equivalent function using Exceptions:
class ElemIdx { public: int idx; ElemIdx(int i):idx(i){} }; template <typename T> bool update_data(vector<data_row<T>>& matrix, T query, T update) { try{ for(auto& row : matrix) { int c = -1; for(auto d : row) { c++; if (d == query) { row[c] = update; throw ElemIdx{c}; } } } }catch(ElemIdx col){ //Update the column for(auto& row: matrix) { if (col.idx < row.size()){ row[col.idx] += update; if (row[col.idx] == 0) row.erase(row.begin()+col.idx); } } return true; } return false; }
In this version, once we find the element, we throw an exception. Since
the code handling the exception finalizes the function, there is no need
for an extra check, hence if no exception was thrown, we know no element
was found.
The language can throw primitive types as exceptions, however, wrapping
them in a type makes the program a bit more safe.
How does these two versions compare?
I filled a 10000 x 10000 matrix of ints in the
following way:
for (size_t i = 0; i != dim; i++){ vector
row; for(size_t j = 0; j != dim; j++) row.push_back(j%(i+1)+1); matrix.push_back(std::move(row)); }
And then query and update for the number 50000 (s = 50000) with:
for (size_t i = 0; i != rounds; i++){ if (!update_data(matrix,s,s/2)) s /= 2; }
And timed this last loop across 10 runs each with 50
rounds (rounds = 50).
The version without exceptions timed on average 20647.3 milliseconds while
the version with exceptions timed on average 18288.3 milliseconds.
This (artificial) example shows that Exceptions are flexible enough to be
used only as error-handling mechanisms.
Software development is a fascinating discipline. There is always a dichotomy between
adopting new features and staying in the "known".
The introduction of Stroustrup's paper illustrates this.
On the one hand Stroustrup states the language's long history:
... leads many developers, teachers, and academics to
overlook decades of progress and describe C++ as if today was still the
second millennium ...
On the other hand, Stroustrup also states that:
Stability - compatibility with earlier versions of C++ -
is immensely important, especially for organizations that maintain software
systems for decades.
Software is written by humans and we have preferences, beliefs, routines,
and so on.
It is also written in a language, that, like any other language invented
by humans, it changes over time.
I think it is pretty amazing that my FreeBSD box can compile C++ code written
in 1985, but I also think that such feature is, to use Stroustrup's terms,
suboptimal.
Software is not like a mechanical motor it does not tend to wear with time.
If you have a motor running 24/7 then some of its parts will wear and will
end up failing.
That does not happen with software.
However, the moment we write a piece of software we assume a responsibility
of maintaining it, not only to ensure its functional correctness and its
clarity but to keep it compliant with the latest version of our language
of choice.
Failing to do the latest is like asking a High School Calculus student to
be able to read Latin and understand Newton's original works.
If we change/upgrade our computers each five years or so, why don't we also
upgrade/change our code-bases?
Until we construct the ultimate language coped with the ultimate compiler,
we are stuck at re-code or perish, but if natural languages are any
indication, then, after thousands of years our natural languages keep
evolving.
[1] Stroustrup, Bjarne. "21st Century C++", February 4, 2025.
https://cacm.acm.org/blogcacm/21st-century-c/
(last accessed: February 16, 2025)
[2] Standard C++ Modules https://clang.llvm.org/docs/StandardCPlusPlusModules.html
(last accessed: February 16, 2025)
[3] C++ Modules https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Modules.html
(last accessed: February 16, 2025)
[4] Nielson, F., et al., "Principles of Program Analysis", Springer,
2nd printing, 2005.
[5] Mitchel,J.,"Concepts in Programming Languages", Cambridge University
Press,2003.
[6] likely() and unlikely()
https://kernelnewbies.org/FAQ/LikelyUnlikely
(last accessed: February 23, 2025)
[7] Boost Macro Reference
https://www.boost.org/doc/libs/1_87_0/libs/config/doc/html/boost_config/boost_macro_reference.html
(last accessed: February 23, 2025)
Copyright Alfredo Cruz-Carlon, 2025