Trip report: Summer ISO C++ standards meeting in Varna, Bulgaria

muxator · on June 19, 2023

> Some compiler needs to implement -Wunderbar

Hidden gem!

On a more serious note, the author of this little proposal (officializing "_" as a no name placeholder) had to perform a thorough research to show that this change would not break existing code.

This kind of attention is necessary for a language with broad scope and a long successful history such as C++, and would be for every other language with such characteristics.

While more modern languages benefit from the errors of the older ones, I do not think they will be exempt from this kind of responsible growth process when they will become decades old.

lifthrasiir · on June 19, 2023

> While more modern languages benefit from the errors of the older ones, I do not think they will be exempt from this kind of responsible growth process when they will become decades old.

This is true, but C++ hasn't learned enough from other languages' experiences either. Let's assume that `_` couldn't not be introduced without breaking changes. Even in that case it would be reasonable to opt in that syntax for some blocks of code, without breaking any existing code which hasn't opted in, and this "edition" mechanism was very successful to evolve syntaxes and often semantics in recent languages (JS, Rust, ...). C++ is still struggling with a false premise of stable ABI (see [1] for a good argument against it).

[1] https://thephd.dev/binary-banshees-digital-demons-abi-c-c++-...

jmgao · on June 19, 2023

> Even in that case it would be reasonable to opt in that syntax for some blocks of code, without breaking any existing code which hasn't opted in, and this "edition" mechanism was very successful to evolve syntaxes and often semantics in recent languages (JS, Rust, ...).

You can't really do this in a language where the way you use libraries is by textual inclusion of headers which can include arbitrary code (no one uses modules).

tialaramex · on June 19, 2023

The proposal (Epochs) to do this in C++ even says it needs modules. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p18...

It would have been extremely difficult to do this for say C++ 20 even if the other work was abandoned (which IMO would have been sensible) and I feel confident it is now impossible.

lifthrasiir · on June 19, 2023

That's a bad excuse. In fact I think editions are possible without modules at all, if you are really willing to fix this issue. Tying epochs to modules suggests that they don't think this is a huge issue to start with.

One possible design is to assign editions to all preprocessed tokens. This implies the edition definition should be some sort of pragma (say, `#pragma STDCXX EDITION 2023`) and preprocessors should add pragmas themselves if tokens with a different edition are transcluded into other tokens. Implementations with integrated preprocessor and parser may also use a simpler internal representation to convey editions as well. Now `_` would be a normal identifier in past editions and a placeholder in a new edition, however they are passed through macros. For example:

    #pragma STDCXX EDITION 2023
    // Hereafter `_` is parsed as `_2023` internally
    #define GUARD() auto _ = make_guard()

    #pragma STDCXX EDITION 2020
    // Hereafter `_` is parsed as `_2020` internally
    {
        GUARD();
        GUARD();             // Okay, equivalent to `auto _2023 = make_guard();`
        foo(&_);             // Up to the proposal author, but P2169 suggests that
                             // this `_` should refer to the second `_2023`
    }
    auto _ = make_guard();   // Okay, equivalent to `auto _2020 = make_guard();`
    auto _ = make_guard();   // Nope, `_2020` can't be defined more than once in the same scope

Of course there are more questions to answer. For example what would `#` and `##` do to editions? Should the current edition reset after `#include` finishes? (I think so.) How would we compute the edition for specific syntax from constituent tokens? (A systematic approach would be to collect a set of all editions used in syntactic tokens, and if not all editions in that set result in the same result, to issue an error.)

But I still argue that this design is simple, checks all checkboxes as long as open questions are decided, and can be independently deployed. And this token-level tracking is not something alien, it's required to give more accurate diagnostics anyway---and other languages like Rust have even used it to implement things like macro hygiene.

jmgao · on June 19, 2023

> But I still argue that this design is simple, checks all checkboxes as long as open questions are decided, and can be independently deployed.

You're handwaving a ton of complexity. You don't even need token pasting for this to become a confusing mess, just consider what happens if GUARD takes the name as an argument, or if an expression is constructed by putting two macros with different editions next to each other. You're also missing the biggest issue, which is 'what happens with headers that don't care about editions, and only begrudgingly even include an extern "C" for C++ users?'.

> (A systematic approach would be to collect a set of all editions used in syntactic tokens, and if not all editions in that set result in the same result, to issue an error.)

This does not work in general. Suppose you wanted to remove most vexing parse in a new edition: changing the result of `Foo bar(std::array<int, PAGE_SIZE>())` is the goal!

lifthrasiir · on June 19, 2023

> just consider what happens if GUARD takes the name as an argument,

The name token will be assigned the edition of the originating position.

    #pragma STDCXX EDITION 2023
    #define GUARD(var) auto var = make_guard()
    #pragma STDCXX EDITION 2020
    GUARD(_);
    GUARD(_); // `_` is a normal identifier, so this will probably error.

> or if an expression is constructed by putting two macros with different editions next to each other.

Let's replicate the removal of implicit conversions from the Epochs proposal then.

    #pragma STDCXX EDITION 2020
    #define MOVE move
    #define LPAREN (
    #define RPAREN )
    #define COMMA ,
    #define P p
    #define LIT_3_42 3.42
    #define LIT_2_49 2.49

    #pragma STDCXX EDITION 2023
    void move(Particle&, float x, float y);

    void foo() {
        Particle p{};

        // error, all tokens are in the 2023 edition
        move(p, 3.42, 2.49);

        // depending on the proposal, at least one of them
        // should work because all constituent tokens for
        // the call expression are in the 2020 edition.
        // proposal authors would then be responsible for
        // which tokens are considered "constituent" here.
        move LPAREN p,      3.42,          2.49     RPAREN;
        move LPAREN p COMMA 3.42     COMMA 2.49     RPAREN;
        MOVE LPAREN p COMMA 3.42     COMMA 2.49     RPAREN;
        MOVE LPAREN P COMMA LIT_3_42 COMMA LIT_2_49 RPAREN;

        // assuming the call only regards `(` and `)` as
        // constituent tokens, the following call has
        // ambiguous editions and will fail to compile.
        move(p, 3.42, 2.49 RPAREN;
    }

So my chain of logic is that, we ideally want to put editions to the AST, but the existence of textual inclusion means that edition should also exist in tokens and the AST node's edition will have to be computed from them. Not a big deal for existing compilers (they already track spans).

> what happens with headers that don't care about editions, and only begrudgingly even include an extern "C" for C++ users?

For this reason I believe the edition should reset across files, reverting to the default edition.

And in fact this feature will benefit both C and C++, and thus ideally be harmonized. In this way we will no longer have to special-case some files to be implicitly in the `extern "C"` block! (Yes, this exists, see for example -internal-externc-isystem in clang cc1.)

> Suppose you wanted to remove most vexing parse in a new edition: changing the result of `Foo bar(std::array<int, PAGE_SIZE>())` is the goal!

Now this is probably a better question, because we have tons of options for how existing parsers should behave. Note that I've intentionally omitted this part because those options are only useful when the ambiguity arises, and I think they can be determined in the case-by-case basis.

But if you indeed want to remove the most vexing parse, it's mostly up to the proposal author's decision. I should note the following quote from ISO C++:

If the statement cannot syntactically be a declaration, there is no ambiguity, so this rule does not apply. The whole statement might need to be examined to determine whether this is the case. [1]

So the parser should have already produced two possible ASTs for the same token sequence in the first place and picked one of them, in principle. The proposal therefore will determine which token affects this decision---for example that may be `(` and `)` as in the call, or the whole `std::array<int, PAGE_SIZE>()` sequence that can be parsed as a type-id. In any case the proposal will give a concrete and compatible algorithm to remove the most vexing parse in a new edition. You can't remove it in existing editions, and this is by design.

(Alternatively and probably more commonly, the parser may use a prioritized choice, i.e. trying to parse type-id first then initializer-clause if it fails. In this case parsing type-id may have to signal that some constituent tokens are of a new edition and backtracking might be needed. The proposal author would want to experiment with different choices of constituent tokens to ease the implementation. But again, parser should have already implemented backtracking so this should be not a huge cost.)

[1] https://timsong-cpp.github.io/cppwp/n4868/stmt.ambig (this is not exactly about the most vexing parse but both cases are using the same wording)

JohnFen · on June 19, 2023

The main problem with C++ right now, in my opinion, is that is has turned into a hodgepodge of syntactic idiosyncrasies. I'm not sure adding more would be a benefit.

Asooka · on June 19, 2023

You can just say that this feature is only local to the current file and does not affect included files, nor files that this one is included in. There are already some preprocessor features that act like that, e.g. __FILE__, which contains the name of the current file being parsed and obviously is local to it; or "#pragma once", which indicates the current file should not be parsed if it has already been included once. The information for which file is being parsed currently is not lost after the preprocessor stage, so you can have file-local directives.

TeMPOraL · on June 19, 2023

> While more modern languages benefit from the errors of the older ones, I do not think they will be exempt from this kind of responsible growth process when they will become decades old.

(content warning: below are just some thoughts without any kind of final point or argument; skip if you're not interested in random musings)

Makes me think of evolution, biological death, and the limits of lessons learned. C++[0] embeds decades of practical experience; many mistakes and pivots add up to how it looks today - but every change made meant complexity was growing, and today, it's hard to incorporate new experience into the language, because of all that accumulated complexity. New languages get to distill what works, and incorporate the lessons into their design directly - starting fresh, without the complexity baggage. However, some things are lost in translation - all that complexity thrown away wasn't just random noise, it was a record of mistakes and wrong turns that the new languages are now liable to repeat.

What I'm thinking is: evolving a language for decades keeps all the experience, but becomes superlinearly[1] expensive and slow. Starting from scratch resets the costs, but also loses some experience in the process. Where continuous evolution of a language, in the limit, will be approaching the point where incremental upgrade takes infinite effort to pay the "complexity debt", the "one funeral at a time" evolution seems like it could reach a different limit: the point at which it each generational transition loses as much experience as the subsequent generation will add.

Not sure what to make of it yet.

--

[0] - I'm including here both the language proper, standard library and compilers, which all co-evolved, and all form the larger body of C++ as a real tool, and not just paper abstraction.

[1] - Would say exponentially, but I'm not really sure how fast it grows - only that it grows faster than linear.

einpoklum · on June 19, 2023

> New languages get to distill what works, and incorporate the lessons into their design directly

In principle. In practice, most new languages distill _some_ of what works and incorporate _some_ lessons. Which is ok, but the point is that when they hit those situations where things _don't_ work, they either ignore it or start becoming more complex themselves.

> but becomes superlinearly[1] expensive and slow

Are you talking about the language standard? The standard library API? Idiomatic use "cognitive load"? Advanced use brain load?

For some of these, your statement is not true. Especially for the idiomatic use of the language: Quite a few things have gotten simpler, rather than more complex, for the lay user, with the addition of more functionality into the language.

TeMPOraL · on June 19, 2023

>> but becomes superlinearly[1] expensive and slow

> Are you talking about the language standard? (...)

I'm talking about effort to incorporate new lessons into language standard, and/or standard library, and/or compilers, while trying to maintain some backwards compatibility with at least few earlier "versions", and generally not reinvent half the language. This means you can't just add or modify a thing - you'll need to do some serious research, discuss things with compiler vendors and interest groups, etc. And you usually can't implement the new idea directly in the most clean/obvious way - you need to find a way that doesn't interfere with any of the existing rules of the language and surrounding customs.

criddell · on June 19, 2023

> New languages get to distill what works, and incorporate the lessons into their design directly - starting fresh, without the complexity baggage.

I think that may be the motivation for Sutter's work on cpp2.

https://github.com/hsutter/cppfront

dthul · on June 19, 2023

There is an interesting approach to this in Rust: if a potentially breaking change (e.g. a soundness fix) is being proposed, they usually test it against all publicly available Rust code.

eesmith · on June 19, 2023

I'm not sure that's the flex you think it might be.

At least, I interpret it as saying there isn't much publicly available Rust code, and only a few places to find Rust code.

I have a hard time even estimating how long it would take to test a change against all publicly available C++ code.

FWIW, the C++ standards developers use do use code search tools to help identify possible breakage.

aldanor · on June 19, 2023

It's exactly the flex you might think it is. The point is not in the number, but rather in the fact that you can build and test almost all crates available on almost all supported platforms, regardless of how many there are, and with no human intervention.

For C++, there's no registry to start with, but even if there was: there's no standard way of building and testing projects. So, a crater build like this is simply impossible at a scale.

lionkor · on June 19, 2023

> For C++, there's no registry to start with

That's not right, there are multiple. There's no single registry. You can use the vcpkg registry, for example - that holds all small and large libraries I've ever needed (even one of my own). They also come with a standard way to build them, of course (CMake targets).

Have you done C++ development recently? I fear a large part of the C++ crowd may not be aware that package managers and build systems are available, they're just not preinstalled with C++.

aldanor · on June 19, 2023

I know about vcpkg but there's only 2k packages there. The point of the grandparent comment was that there's SO much C++ code / so many packages you can build in a crater run that it would be physically unfeasible.

The problem with vcpkg, just like with any other "ports" package manager, is that it's not maintained by the original authors of the code but rather by a separate community. It's a bunch of "ports", trying to standardise the builds and installs to a common format. Out of wonder, I looked at a few recipes, they seem to just install things for you, but you have no automated way to do a crater run still - since most of those libraries are header-only you will need to write library-specific code in each case to actually use each package; at least a single include. The tests are seemingly also not being run.

lionkor · on June 19, 2023

That's fair - but since this is about core language changes, compiling the non-header-only libraries already covers most of the commonly used libraries. Libraries like boost will also use most existing C++ features. I can't think of a single feature boost doesn't use, actually.

tialaramex · on June 19, 2023

Even in the best case scenario, where this partial coverage touched everything that mattered, you are still screwed in C++ because of IFNDR ("Ill-formed, no diagnostic required" a recurring phrase in the ISO document).

If what I wrote isn't a Rust program, it doesn't compile. But if what I wrote isn't a C++ program, because of IFNDR it might compile anyway, and in a whole bunch of cases it must compile anyway because the alternative would be that our fundamental understanding of mathematics is wrong (or the compiler is broken).

This makes Crater runs fundamentally more powerful, even ignoring the practical problems C++ hasn't solved such as a lack of tooling.

lifthrasiir · on June 19, 2023

Note that this tool (crater) is mainly used to catch regressions, not to judge whether it's worth to intentionally break things (which is what editions are for). If many crates depend on the bug which devs want to fix crater will probably detect that and devs will consider other alternatives.

Entalpi · on June 19, 2023

Pretty sure from a build system perspective its quite a flex .. to be fair.

eesmith · on June 19, 2023

I understand the intent of the flex, but if true, it suggests there's very little public Rust outside of packages that can be downloaded from crates.io and a smallish list of alternatives.

By comparison, there's so much publicly available Python code, from so many sources, that no one can honestly say they can even find it all. The same for C++.

I've seen papers where the source code was included in the paper itself (eg, the FORTRAN code in Sibson's 1973 "SLINK" paper), or only distributed as a zip file from the author's web site, or in the supplementary data (eg, https://scholar.google.com/scholar?q=%22source+code+in+the+s... ) .

Personally, I don't think it's true. I suspect Rust changes - just like new proposed C++ changes - are checked against only easily and "well-known" accessible package.

dralley · on June 19, 2023

>if true, it suggests there's very little public Rust outside of packages that can be downloaded from crates.io and a smallish list of alternatives.

You seem to be suggesting that it's a good thing that the public code is spread across so many different places that it cannot all be found. I don't see how that's an inherently good thing. It says less about the total amount of code than it does about the lack of any central resource that can be consulted.

eesmith · on June 19, 2023

> that it cannot all be found.

Do you think you can find all public Rust code?

Like, if I'm teaching a Rust course, and put a hello-world.rs program on my department's public GitLab instance, under an MIT license, do you think I should also put that on GitHub? And register it as a crate?

> the lack of any central resource that can be consulted.

And you say that like it's a good thing.

You want everything to be centralized on GitHub? If so, you want to force all research software developers to agree to the GitHub's terms, including those who are ardent free software advocates.

You also prevent 12 years olds from publishing their Rust source code. (GitHub's terms of service don't allow that.)

Or, do you also allow BitBucket [1], and GitLab [2]?

[1] https://bitbucket.org/project_samar/samar_lite/src/master/ contains two Rust programs, neither on crates.io

[2] https://gitlab.com/rouault-team-public/analysis/umaprs

What about department instances of GitLab? [3]

https://gitlab.anu.edu.au/mu/mu-impl-fast/-/tree/rtmu-dev

It really doesn't seem like it's all that easy to find all publicly available Rust code.

dralley · on June 19, 2023

What bearing does any of this have on the previous thread of discussion?

Why do you think a 12 year old needs to publish their "hello world" programs because of Crater? The purpose of Crater is uncovering subtle compiler regressions. If "hello world" is ever broken then it would likely be discovered by the standard test suite or generally long before the Crater run.

This isn't a matter of "allowing" anything. It's just a statement that yes a Crater run does test all meaningful publicly available code, where "meaningful" at the very least means code which is consumed via crates.io. Sure, there is very likely public code that exists elsewhere which Crater cannot find, and that's OK. The point is that a Crater run coming back clean means something, because a very very wide swath of code was tested.

eesmith · on June 19, 2023

What is "the previous thread of discussion?"

My response was all of 5 lines, saying that if dthul's comment were true, then it implies that Rust has a rather small code base.

And indeed, Crater does not test all publicly available Rust code. ("Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere", and only for "Linux builds on x86_64", not Windows, says https://rustc-dev-guide.rust-lang.org/tests/crater.html).

Rust is much bigger than dthul's comment implies.

You may well be correct when adding the qualifier "meaningful", but that's a different thread of discussion.

> Why do you think a 12 year old needs to publish their "hello world" programs because of Crater?

I mentioned that because you changed the thread of discussion to discuss centralized vs. decentralized code distribution.

> because a very very wide swath of code was tested.

And C++ language developers also analyze a 'wide swath of code' - millions of lines or more - for changes.

59nadir · on June 19, 2023

To be fair to the original argument, I think it's important to understand that there is next to no Rust code in comparison to the amount of C++ code out there. It has almost no projects in comparison, and those projects are much, much smaller. I don't think that's a very controversial statement, because it's very obviously true.

Now, it's also important to keep in mind that C++ has a terrible story when it comes to centralized (or otherwise, really?) repositories for packages, so the corresponding system for C++ is at the moment completely infeasible and not at all useful. That doesn't really make the Rust code that's tested against any more meaningful in comparison to the vast amounts of C++ code out there, though.

Edit:

At the kind of pointless and debilitating scale that C++ exists and then with the relationship C++ has with packages and dependency management this entire idea is basically impossible.

tialaramex · on June 19, 2023

Rather than hypothesising about an imagined tool you could look at the actual tool which of course is in Rust's source code repo: https://github.com/rust-lang/crater

> new proposed C++ changes - are checked against only easily and "well-known" accessible package.

Now that I have, so to say, shown you mine, lets see yours. Where is the tool to perform these checks in C++?

eesmith · on June 19, 2023

Thank you for showing that I was right to in my belief: 'I suspect Rust changes - just like new proposed C++ changes - are checked against only easily and "well-known" accessible package.'

My point is that dthul's comment "they usually test it against all publicly available Rust code" implies Rust has a very small user base. Since crater runs only against "parts of the Rust" - those available on GitHub and crates - it implies a rather larger ecosystem.

As for "mine" - what I know about C++ development comes from reading links posted to HN; hardly "mine" in any meaningful sense. I also don't accept your wording "these checks", because my point is that similarly useful checks are done, not exactly identical tests. I wrote 'FWIW, the C++ standards developers use do use code search tools to help identify possible breakage.'

From previous readings, I know they do code surveys, and experiments using existing code bases and compilers.

For examples, there's https://codesearch.isocpp.org/ ("developed for ISO Standard C++ proposal authors in order to explore existing C++ practice and to provide empirical evidence to support claims about existing practice made in proposals.") done in surveys to understand how code is used. For example, https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p14... .

At https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p11... they used a custom tool to analyze Boost, Chromium, Firefox, the Linux Kernel, Libreoffice, LLVM, and Qt: "Estimated 30 to 80 millions LOC compiled".

tialaramex · on June 20, 2023

I don't see "We sometimes do some ad hoc checks including looking for stuff with code search" as "similarly useful" to using proper test automation at all.

And I think the results continue to speak for themselves.

eesmith · on June 20, 2023

"Estimated 30 to 80 millions LOC compiled" sounds more than code search, yes?

Don't confuse my ignorance of the process for lack of process.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27... describes a proposal to "zero-initialize all objects of automatic storage duration", with a test-implementation as an "opt-in compiler flag", and tested on "The OS of every desktop, laptop, and smartphone that you own; The web browser you’re using to read this paper; Many kernel extensions and userspace program in your laptop and smartphone; and Likely to your favorite videogame console."

Or from https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n43... "To assess how common these cases are likely to be in practice, we conducted a ClangMR analysis of a codebase of over 100 million lines of C++ code, identifying every location where a std::function is given a new target".

"Proper" and "ad hoc" have very strong personal components. Is it proper or ad hoc that Crater only tests public code, while C++ developers have access to large private code bases ("the OS of every desktop") for carrying out their tests?

Is it proper or ad hoc that Crater only checks crates.io and some GitHub repos?

Is it proper or ad hoc that Crater doesn't test under Microsoft Windows?

As for the results, what will Rust language development look like when there's 10 billion lines of Rust code, and only a tiny fraction of it is visible?

tialaramex · on June 20, 2023

> "Estimated 30 to 80 millions LOC compiled" sounds more than code search, yes?

Does it? Your belief is that the authors wrote two compilers (C and C++ because these codebases are in two different languages) with these features they're not proposing and don't think should be used, in order to actually compile this code and check it works - but alas although they had to do all this complex compiler internal work they didn't find time to have the frontend parser count the lines of input ?

"They just used code search and estimated" doesn't sound infinitely more likely to you?

> Don't confuse my ignorance of the process for lack of process.

Your ignorance certainly plays a role, but I don't see process.

P2723 is talking about widespread experience in real systems, but it's not a "test" implementation, it's just widespread real world tooling because this is a real world safety hazard regardless of whether C++ ever fixes it. -ftrivial-auto-var-init is the name of the Clang and GCC flag for example. That's how they can be confident it's used by "The OS of every desktop, laptop and smartphone you own" - it's one of the early checklist items that OS vendors have to slightly improved their C and sometimes C++ programs at very low cost.

Microsoft's team actually gave a talk about landing their equivalent, they had to fight harder because inside a proprietary codebase turns out even more C++ programmers mistake their ignorance for competence, and thus are convinced the C++ standard is correct here and such mitigations are at best a waste of time and at worst actively destructive. Also their optimiser is apparently terrible, which if you've used MSVC checks out.

Thus this C++ proposal is, like in "days of yore" just citing existing real world use.

The C++ developers don't actually have direct access to other people's code. JF Bastien (the paper's author) used to work for Apple, so it's possible he's actually seen Apple's teams using this flag, but either way Apple have announced that they do so. Microsoft publicly talked about using their equivalent for Windows, and the Linux vendors advertise that they have such mitigations. Anecdotes. To insulate this proposal (not very effectively it turned out) against people who insist the price of this change is too high to be feasible.

It turns out that in C++ land "We actually did this and it works" does not trump "I don't think it would work"

N4348 is talking about, and indeed cites, Google's experience with its own code using a smarter "refactoring" tool that Chandler and Hyrum have talked about publicly on several occasions. This is slightly fancier than code search, but it's still very much ad hoc which is why this gets mentioned once in that paper but isn't in the others you looked at.

When a tool systematically does the same thing, over, and over, that's anything but ad hoc.

In some ways you should expect Rust code to grow more slowly. If you ask that Code search guy from your previous comment, he'll tell you that a lot of C and C++ software has big machine generated data files as "source code". Until C23 there is no #embed whereas Rust has from the outset offered std::include_bytes! which is what you'd want instead of #embed if you weren't fighting neanderthals (Jean-Hyde sounds exhausted by the experience)

However over time of course software grows, and the more powerful, safer abstractions in Rust are expected to encourage that, so sure, 10 billion lines of Rust, I'm not sure why that's such a milestone. No I don't expect big changes as a result.

Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received?

eesmith · on June 20, 2023

> 30 to 80 millions LOC compiled

I figured it was because "line of code" is not all that meaningful, and not worth specifying more precisely than that.

Does it include comments? Is it after macro expansion? What about \ continuations? Does a bare "}" on its own count as a line of code?

BTW, how many LOC does Crater run in a full test, and how long does it take/how expensive is a run? I failed to find that information.

> The C++ developers don't actually have direct access to other people's code

I don't know what you mean by that. They certainly have access to public source code, just like Rust developers do. (Chromium, LLVM, Boost are mentioned in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p11... ).

It would seem very odd if Microsoft's representatives had no idea how changes to C++ would affect internal Microsoft code. I strongly suspect VC++ changes/extensions are tested against in-house Microsoft code bases before making their way to the standard, because it makes no sense to undermine your own systems. For the same reason, I suspect proposed changes are tested internally at Microsoft.

And from papers like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p21... I know there is in-house experience of proprietary code bases guiding how the C++ standard changes.

> Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received?

That's not really my point. (Indeed, as that last paper link from Bloomberg points out, "It is our understanding that Bloomberg’s experience is not dissimilar to most Free/Libre Open Source Software communities".) Instead:

1) How much of the "very very wide swath of code" is meaningful, in terms of language feedback? That is, how much of the automation being employed because it's there, rather than because it's useful?

If an automated method checks 500M LOC but the interesting cases only ever come from the same set of 1M LOC, wouldn't reducing the working set help with turnaround?

(Indeed, https://ethz.ch/content/dam/ethz/special-interest/infk/chair... uses Crater to look at only the 500 most used crates, implying they think using an ad hoc subset is sufficient for their purposes.)

(Incidentally, it's hard to find any published scholarly papers on Crater. There's a lot of rust, both the iron and plant kind, in terrestrial craters!)

2) Would a C++ equivalent for Chromium, Qt, LibreOffice, KDE, Firefox, and a few dozen well-known large packages give the same feedback for C++? Why or why not?

If not, would ~100 packages be enough? What about ~1,000?

3) How do you know that Rust compilation of the packages on crates.io, only for x86-64 Linux, give better feedback for the types of issues that C++ faces, than the "ad hoc" methods they use for C++?

That is, just because a tool fits Rust's needs and goals doesn't mean it fit's the C++ spec developers needs and goals.

4) How would a tool like Crater help in a possible future where there are a dozen different and competing Rust implementations?

That is, https://blog.m-ou.se/rust-standard/ argues there doesn't need to be a standards committee for Rust because there is only one Rust implementation, with tools like Crater to help maintain compatibility. I'm familiar with this viewpoint as I come from the Python world; while there are alternative Python implementations, they all look to CPython as the reference language.

But in C++ there are many C++ vendors, some with economic incentive to have new features which might break old code, but which their customers will pay for. On the other hand, their customers have the economic inventive to prevent vendor lock-in. Hence, a standard.

If a hypothetical EESMith Rust drops a few rarely used features to give a 2x run-time performance gain and 5x compilation performance gain, then you can bet that people will switch to it. But is that Rust? And will mainline Rust still preserve backwards compatibility even in the face of competition?

> I'm not sure why that's such a milestone

Do you expect Crater to scale to compile 10 billion lines of Rust in a reasonable time and cost? Or will Crater drop testing most packages by then?

> Jean-Hyde sounds exhausted by the experience

Developing a C++ standard with multiple entrenched and sometimes competing vendors is no easy task. Rust doesn't have to deal with it ... yet.

tialaramex · on June 20, 2023

> BTW, how many LOC does Crater run in a full test, and how long does it take/how expensive is a run? I failed to find that information.

I have nothing more than a finger in the air estimate for LOC, maybe hundreds of millions?

I have never watched a "full test" like for a release build, I believe those take several days - but when Crater is asked just to build everything that takes a little under 24 hours with its current footprint.

> I strongly suspect VC++ changes/extensions are tested against in-house Microsoft code bases before making their way to the standard, because it makes no sense to undermine your own systems.

Surely it stands to reason that if Microsoft are proposing standardisation of a feature they've shipped in MSVC, that's also a feature they've tried using? This model of ISO C++ features (which the developer of Circle also prefers) maps much better to what was initially envisioned than today's reality however. Most C++ proposals today are not submissions of existing compiler features from the big three compilers (MSVC, GCC and Clang) but instead fresh before the committee, often with no implementation experience at all.

That's certainly one way to do it, after all Rust contributors don't have their own Rust compiler either, but it means you need very different tooling.

1) Breadth matters much more than depth for finding surprises which is the thing you won't get with an ad hoc approach. Going from 10% of some big corporate code base to 20% won't make anywhere near the difference you get from adding a hundred one-man-band projects that are smaller even in total, because different stylistic and idiomatic choices make so much more practical difference for this work.

2) As a result "a few dozen" won't cut it. Try all the C++ on github, that seems like a much better place to start.

3) Sure, the primary goal of WG21 proposers is to get into the IS - it would be nice if what they've proposed actually works, but ultimately if it doesn't work that can be fixed later, whereas if it's not adopted then it doesn't matter whether it would work.

Arguably there have never been any versions of the C++ IS which actually describe a complete working programming language, so it's not terribly important that if it were such a system it would be correct, still there's a preference for fewer rather than more horrible gotchas.

I mentioned #embed so that's a useful example here, C++ 23 doesn't standardize #embed. So in theory C++ code can't use #embed, that's not C++. But of course in reality the vendors are going to ship a pre-processor which handles #embed, they don't care, so it'll work and it's widely expected you will be able to use it even in older C++ verisons.

4) If there was a specification then a tool like Crater might be somewhat helpful for that, but I expect that most effort would remain focused on a single implementation, today that is of course the Rustc compiler with its LLVM backend.

The hypothetical EESmith Rust sounds spurious to me, how could it deliver 2x run-time performance by removing "rarely used features" ? I don't think spurious hypotheticals are a good use of anybody's time.

eesmith · on June 21, 2023

> I have nothing more than a finger in the air estimate for LOC, maybe hundreds of millions?

And if there were a number, it would reflect only one of several ways to quantify "LOC", right? Resulting in a spread of numbers that could meaningfully be described as "LOC"?

> Breadth matters much more than depth for finding surprises which is the thing you won't get with an ad hoc approach

I would be quite interested in someone doing a research publication on this topic!

Give the history of using crater, which packages have proved most useful? Do the same core packages prove useful over time, or does the most significant subset change wildly? What does the cumulative distribution plot (#packages until time of appropriate feedback) look like?

How worthwhile is the additional breadth from crates.io + GitHub vs. just crates.io? Is it worthwhile to also include GitLab, and what are the tradeoffs (eg, additional compute costs, additional false positives).

For that matter, how useful would it be to add Linux+ARM to the current Crater tests? Or Microsoft Windows? If breadth is that important, then why skip out on the full set of Rust code you have available?

> As a result "a few dozen" won't cut it.

I did follow up with "If not, would ~100 packages be enough? What about ~1,000?" :)

If there's no equivalent of a dose-response curve / ROC curve / price-performance curve, and the answer is "must try everything" then how do I know the extra effort is useful, rather than FOMO-driven anxiety?

> Try all the C++ on github

Assuming there was a single way to build all C++ code - how much do you think it would cost to compile all the C++ code on GitHub? And why do you think it the additional cost would be worthwhile to C++ standards development?

> but I expect that most effort would remain focused on a single implementation,

Oh, given my experience with Python implementations, I agree!

But my point is processes change when you have multiple competing commercial vendors, which C++ has. So looking at how Rust does things doesn't mean it's also appropriate for C++.

> I don't think spurious hypotheticals are a good use of anybody's time.

Okay, something more practical. C++11 broke backwards compatibility by changing how 'auto' works. "auto int i;" used to be valid, now it's an error. This is a huge boon for usability. It's a trivial syntactic change to fix old code, and long experience shows the old "auto" storage class was rarely used.

How would the systematic compilation of all C++ code on GitHub (assuming that were possible) affect that decision more than the ad hoc methods they did use to make that decision?

Will there really never be something in Rust were a simple breaking change of a rarely used feature can result in an easier-to-use language?

If there can, then you may have a schism, either temporary (gcc vs egcs fork) or more permanent (Perl5/Perl6/Raku). Which will be "Rust"?

The answer is legally quite clear. The Rust Foundation has the trademark to "Rust" (serial number 87796977). My version can't break backwards compatibility, even as a fork, so would have to call it, perhaps, "Verdigris". (As I recall, someone started to develop a "Python 2.8" with more backports from Python 3; the PSF got after them for using the Python trademark that way.)

C++ doesn't have trademark protection, so the legal concept of what is/is not C++ are also different than Rust.

tialaramex · on June 21, 2023

> How would the systematic compilation of all C++ code on GitHub (assuming that were possible) affect that decision more than the ad hoc methods they did use to make that decision?

I doubt it would affect the actual decision at all, WG21 has been very comfortable relying on gut instinct, even in the face of reality, so there's no reason they'd be affected by the results of more systematic testing.

> Will there really never be something in Rust were a simple breaking change of a rarely used feature can result in an easier-to-use language?

Now we're talking about something woollier than your performance hypothetical. Surely almost any change can be sold as "easier-to-use" if you're motivated. Herb Sutter seems motivated for example, every CppCon he has a proposal for how to make C++ "easier to use" by further complicating it. An immediate caution though is, in what way is it "easier-to-use" half of a fractured ecosystem ? The other half is no longer available to you, that's certainly not easier to use than before.

Rust programmers aren't used to taking such deals because Editions have been leveraged to give them better alternatives without the compromise.

This promise got stronger over time, rather than weaker as you seem to expect. There's complicated Rust 1.0 era code (e.g. early ripgrep) which doesn't even build today on a current compiler, because something it did is wrong and Rust 1.0 compiler didn't spot that but modern ones do - back then it was less likely they'd see the compatibility break as a big deal, it was "just" a bug fix.

C++ compilers fix those sort of bugs all the time even today. Rust wouldn't take those fixes so easily, modulo crater measurements, but as you've shown C++ doesn't have that.

eesmith · on June 21, 2023

> so there's no reason they'd be affected by the results of more systematic testing.

Let's go back to the g'parent comment that started this branch, at https://news.ycombinator.com/item?id=36387994 .

muxator wrote "the author of this little proposal (officializing "_" as a no name placeholder) had to perform a thorough research to show that this change would not break existing code".

What was that "thorough research"? The paper doesn't mention it, but does imply there was a code search to find the examples it listed.

I assume you think that research was also "gut instinct", rather than "thorough research". Is that only because it did not do full compilation of all C++ code on GitHub, or is there something more seriously wrong with that research?

Further, while you wrote "Most C++ proposals today are not submissions of existing compiler features from the big three compilers (MSVC, GCC and Clang) but instead fresh before the committee, often with no implementation experience at all", that specific spec says it was implemented in Clang.

It therefore seems like the proposal which kicked off this long thread is a counter-example to your characterization of C++ language development.

You have not addressed my question - how do you know the extra effort in a full Crater run of all crates in crates.io + GitHub is useful, rather than primarily FOMO-driven anxiety?

tialaramex · on June 22, 2023

> I assume you think that research was also "gut instinct", rather than "thorough research".

My goodness no. "Gut instinct" is how the decisions are made, but the research you're talking about was made for a proposal paper. There are different incentives in play.

For the proposer the incentive is to get something to show for the enormous effort expended in making a proposal - usually months, sometimes years, across dozens of meetings and discussions and presentations. It's soul-destroying stuff. Ideally the sub-committees you're seeing would approve your work and it can go to another committee, more likely they will have suggestions for how it could be altered so as to satisfy them, and after a few iterations that can result in approval of a subsequent revised document, often they just have open-ended questions for you, which perhaps might be satisified in some future proposal document, by answering the questions somehow, or they just aren't interested and you're told to go away.

A show of your extensive research might make it easier to achieve your goal. You have an incentive to make this research seem as comprehensive as possible for that purpose in support of your goal.

But the people making the decision don't have that incentive. They could - in principle - spend hours on reading all the work you did, they could - in principle - replicate that work or even do their own research. In reality they are probably thinking about whether they can break early or move on to something they care about more. I would summarise their reasoning as gut instinct. Does this sound like something we should do? Maybe not. Straw poll question: Do we want this? Vote Against, nothing personal.

I mentioned JeanHyde before. JeanHydge has seen how this sausage is made, be sure to read his experience and think about it carefully before believing any fairy tales you've heard or any imagined process. Remember, the essence of JeanHyde's proposal was just this: 1) It would sure be nice to use blobs of binary data in my programs. 2) The existing ways to achieve this are garbage - so we need a new one.

JeanHyde spent years defending basic obvious stuff in front of people strongly motivated to believe he's wrong since that's just easier than doing any work. At its most basic the question, is, given a lot of bytes of data in a file, or a lot of ASCII hexadecimal values written as C literals, which can be processed more quickly ? The committee was strongly motivated to insist the answer was the ASCII hex, even though JeanHyde had tables showing the raw data is much faster.

The committee hallucinated into existence rules like JeanHyde's proposal can't be in the standard unless there are working implementations. If you're wondering why your C++ compiler didn't have a complete C++ 20 implementation in 2021 you might be surprised to hear that there is such a rule -- that's because there is no such rule, it's an excuse.

Another hallucinated rule is very amusing to Rust programmers. WG21 would like to believe that C++ compilation doesn't result in executing code. So, if Bob makes a malicious C++ program, sure, running the program might be bad, but certainly compiling it is fine. This belief is laughable, but laughing at them won't get your proposal accepted, so you must try to navigate the fantasy world they live in, where their C++, which doesn't have this capability, can accept your proposal, without introducing the capability C++ already has. It's like you're playing Mornington Crescent with opponents who believe there are rules and they know what they are. Terrifying.

And so it isn't in C++ 23. The C++ 23 standard doesn't have JeanHyde's proposal. WG14 took #embed for C23, so C23 does have it, and of course in reality C++ programmers can expect to benefit from that, and that's the awful, miserable reality you're defending.

> how do you know the extra effort in a full Crater run of all crates in crates.io + GitHub is useful, rather than primarily FOMO-driven anxiety?

It periodically finds problems. And Rust is equipped to deal with those problems so the forewarning is practically useful.

In C++ if a syntax change breaks some fraction of programs well, too bad. I guess it would be nice to know, but as you saw the committee might (or might not) do it anyway. In Rust, that can be handled via the Editions mechanism. But to do that you need to know about it before you ship the compiler with the syntax change, so as to mark it as applying only to the future edition you're adding it to.

3836293648 · on June 19, 2023

But soundness fixes are typically considered more important than compat and go ahead anyway

ReleaseCandidat · on June 19, 2023

No. This is actually the biggest error C++ made. C++ should have learned from Fortran 77 -> 90 and made C++11 a new language and either let the compilers handling compatibility of C++98 and C++11 or introduce some compatibility functions like `extern C`.

gpderetta · on June 19, 2023

No. 99% backward source compatibility was a primary requirement for C++11. It was bad enough that the ABI was broken.

ReleaseCandidat · on June 19, 2023

That's what I wanted to express: there would be no need to be able to use the old source code together with new code in the same file if they would have gotten rid of include files (and using modules instead). It would have been enough (see Fortran for example) to be able to use C++98 libraries and compile them with the same compiler (every C++ compiler I know of is a C Compiler too).

Asooka · on June 19, 2023

The biggest lesson for me is that you need to provide an incremental path to using the new features. The reason C++ succeeded as much as it did is because you can incrementally C++-ify your existing C code. You can provide C interfaces to your new C++ stuff and you can use C interfaces from C++. Any new feature that you add to C++ has to offer such a path. That is why move semantics were very successful - you could add them incrementally and you could put a preprocessor guard around their definition so your library could be used by code written for both older and newer C++. In contrast, it is hard to incrementally module-ify. You can't transform half of your library to a module and I haven't seen a good story on how to offer a library as both a module and as traditional headers. So it's hard to adopt modules in an existing codebase, thus it's hard to get good feedback on them, thus modules took a long time to really iron out.

Offering a path to incremental improvement is very important for large projects where features and bugfixes happen constantly. We can't stop our regular work to untangle the whole carefully built system of dependencies. The new features have to be offered in a form where we can use them in new code, while remaining compatible with old code and gradually add them to it as time permits.

gpderetta · on June 19, 2023

People want to use new features in existing files without having to translate the whole file in one go. And they expect to link seamlessly the existing files against new files.

If you throw enough tooling at the problem, it could probably be done. But it took 10 years to ship C++11 as is...

ReleaseCandidat · on June 19, 2023

> People want to use new features in existing files

Yes, and it would have been The Right Thing(TM) to ignore these people.

> And they expect to link seamlessly the existing files against new files.

As I said, that shouldn't have been a problem (and it would have been a good time to talk about C++'s ABI). It's possible with C from C++ and there are actually other languages using even other compilers or interpreters or VMs that can use C++. And ignoring the problems when calling a C++ library compiled by another C++ compiler on Windows.

spacechild1 · on June 19, 2023

> Yes, and it would have been The Right Thing(TM) to ignore these people.

No, it wouldn't because it would have seriously hindered C++11 adoption.

I have personally worked on an open source C++ project that had been originally written in C++98, but switched to "modern C++" for certain features (most notably, <thread> and <atomic>). It would have been a major pain if we had to rewrite all of our source code to remove incompatibilities, just so that we could use these new features in a few places. I cannot even imagine what it would be like in a large commercial code base.

> As I said, that shouldn't have been a problem (and it would have been a good time to talk about C++'s ABI). It's possible with C from C++

Well, you could link, but you wouldn't be able to (safely) share any non-POD objects between compilation units because of possible ABI mismatches. That's not particularly useful.

The problem is much more complex than you apparently think.

saalweachter · on June 19, 2023

They did do that.

It is called D.

logicchains · on June 19, 2023

It's nice that whatever may be going on in life, at least we've always got an exciting new C++ release to look forward to every 3 years. I'm particularly excited that C++23 is finally getting std::flat_map and flat_set, more memory-friendly data structures with much nicer latency characteristics than the old non-flat variants.

jb1991 · on June 19, 2023

flat maps are not quite what many assume they are, some panacea for performance. In fact for most cases, you will still benefit from using the ordinary maps already provided in C++. In fact if you merely swap out std::map with std::flat_map, most code will start running much more slowly. They have a very specific, limited use case and for everything else, you should not make any changes.

lionkor · on June 19, 2023

Your comment has a gaping lack of real world profiling and "it depends", so that's likely why it's getting downvoted.

> most code will start running much more slowly

Most code, as in, most code that uses it? Are there are stats on how exactly std::map (and std::unordered_map, the more useful general purpose map) are used?

tialaramex · on June 19, 2023

> and std::unordered_map, the more useful general purpose map

std::unordered_map is a hash table. It's not a very good hash table but that's what it is, and so whilst that is much more generally useful, it's also irrelevant to the purpose of map (and thus of flat_map).

It seems to me that if "most code" would be harmed by using flat_map instead of map that's actually good news, since map isn't going anywhere. What I anticipate is the reality is that a considerable amount of code would benefit from using flat_map instead of map, because of the improved performance from linear access.

logicchains · on June 19, 2023

>some panacea for performance

I said for latency, not performance in general. If you're generally working with maps with only 10s to 100s of small entries, it's easily possible to see a 10x reduction in worst-cost latency by switching from std::[unordered_]map to flat_map, due to a massive reduction in latency and pointer chasing (e.g. std::unordered_map allocates for every single insertion, and std::map lookup/pointer chasing often leads to log(n) cache misses).

saboot · on June 19, 2023

Ctrl+f "Reflection" -> 0 results

Maybe next year ..

einpoklum · on June 19, 2023

Full compile-time reflection perhaps not, but we are getting deduction-of-this in C++23, which is a useful reflection feature:

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p08...

blt · on June 19, 2023

pybind29 has a nice ring to it.

2h · on June 19, 2023

Same for "safety"

jupp0r · on June 19, 2023

If you are waiting for a Rust-like feature that guarantees memory safety - this won't happen. Changes to the language required for this would break too much. Also, existing code wouldn't compile anymore because it contains a ton of memory bugs.

nick__m · on June 19, 2023

reflection seems almost dead but there are 2 papers listed on Herb's summary that directly relate to safety:

P2530 hazaerd pointers https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p25...

P2757 type checking for std::format https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...

tialaramex · on June 19, 2023

P2530 isn't really about safety, it's an example of how inadequate package management tooling results in the C++ standard library having to (very poorly) attempt the role of curator instead. C++ needs reclamation mechanisms and astoundingly even in 2023 that still means they need to live in the standard library and be designed by WG21. It's a story about institutional failure.

This is one of a pair of proposals (the other is for RCU) - both of which will likely get accepted - with useful but simplified reclamation schemes. In Rust their equivalents live on crates.io, e.g. https://crates.io/crates/haphazard is more or less the same Hazard Pointers, even based on the same work at Folly - and so they can be updated, or obsoleted, or alternatives may become prominent, it all Just Works™.

Closer to safety work is the proposal to have C++ admit that it's possible for a program to be wrong and yet not throw its hands in the air and refuse to explain what it does, as "Erroneous Behaviour". The idea is that e.g. suppose I declare there's going to be a variable named foo, which is an integer, but then I never actually initialize it. Well that's not OK, that program is nonsense, but C++ has long allowed it, even though it's nonsense, so just refusing to compile would annoy programmers. On the other hand, if foo is now silently zero, we actually introduced new semantics to the language, which isn't OK. With "Erroneous Behaviour" C++ could emit a warning ("Don't do that, uninitialized variable foo") but also zero it anyway. A new syntax would be added to mean "I genuinely don't want this initialized, for some good reason" because it's still C++ so being needlessly dangerous is part of their whole ethos, but at least you did it on purpose.

I tried to find this proposal but the links don't work, once upon a time it was 2795 "Erroneous Behaviour". Maybe the author asked that it be withdrawn for some reason?

gpderetta · on June 19, 2023

I fail to see the difference. You can find hazard pointers and RCU as libraries in C++. The only reason these are proposed for standardization as opposed to other features is that someone is willing to go through the pains for their pet feature.

tialaramex · on June 19, 2023

Actually the rationale document says that "The plan was always to include a basic interface into C++ IS". I'm not sure that "This was always the plan" constitutes a rationale, because it begs the question, but not my circus, not my monkeys.

I'm arguing that C++ has to pay this cost because of inadequate tooling. There is, as you correctly observe, no functional advantage. Unlike std::string_view this is not a situation where shared vocabulary is valuable. I probably do not want to expose my use of Hazard Pointers in an API for third parties, for example.

Look at the story of Hive. It's a niche data container type, it's not necessary vocabulary, it's not a huge performance win for most people, it's not a fundamentally novel core type, in Rust the RFC would have been closed years ago. Aria even tried to banish LinkedList -- she failed, but that's how high the bar was for weird container types. WG21 are still spending committee time on Hive.

gpderetta · on June 20, 2023

If you read the list of authors of the paper it is the who's-who of lock free programming. Many of them designed the c++ concurrent memory model, std::thread, std::atomic. Of course Paul "RCU" Mckenny sees adding RCU to the standard as concluding his work.

plorkyeran · on June 19, 2023

Yeah, hazard pointers address one of the very common sources of memory safety problems (iterator invalidation). It's definitely not a magic solution that solves all problems, especially given that none of the standard library containers will use them, but they're a very useful tool.

gpderetta · on June 19, 2023

Hazard pointers matter if you are writing lock-free concurrent data structures and you need a way to handle deferred reclamation after a node delete. It is a very exotic problem to have and it is hardly a 'very common source of memory safety issue'. I'm not sure what they have to do the general iterator invalidator problem.

plorkyeran · on June 19, 2023

Well, you could read the proposal? Using them in iterators is one of the use cases it specifically calls out. Hazard pointers are a key part of how Folly's ConcurrentHashMap lets you mutate the hash map without invalidating existing iterators into it, and while that's intended for multi-threaded cases it also works if the writes happen to be on the same thread as the read.

alphanullmeric · on June 19, 2023

Turns out no language actually used in production wants to copy the mistakes of rust.

pjmlp · on June 19, 2023

Reducing UB use cases is safety related improvements.

staunton · on June 19, 2023

I still think they should stop adding new features and do what the C committee does in keeping the language stable over the decades which it will take to slowly die, as it should.

Can't the new-feature enthusiasts just make a new language, or join existing efforts at building new ones? C++ is already a mess of around 3.5 languages jumbled together. When will it stop?

rewmie · on June 19, 2023

> I still think they should stop adding new features and do what the C committee does in keeping the language stable over the decades which it will take to slowly die, as it should.

Other than catering to your unexplainable desire to see the death if one of the most popular programming languages ever designed, what would be the point of that?

More importantly, what leads you to believe that there is value in stopping others from improving upon a language they use?

> Can't the new-feature enthusiasts just make a new language, or join existing efforts at building new ones?

What's the point of that? You have a perfectly good and working programming language that is used to build software everyone in the world uses every single day. Why would anyone think it's a good idea to just throw that out and water their time reinventing the wheel? Are you the one who is going to rewrite every single C++ software project into your flavor of the month?

staunton · on June 19, 2023

> what leads you to believe that there is value in stopping others from improving upon a language they use?

Continuously "improving" a language has upsides and downsides. The upside is, of course, people get to use the new features, which is good if the features are useful. A significant downside is the added complexity once there are very many features. A lot of the features of C++ don't interact in good ways and generally make it easy to shoot oneself im the foot. C++ has bad defaults (they were most reasonable at the time, of course) and it's not possible to change them due to backwards compatibility. C++ is also the most complex (widely used) language, to the point where it's quite easy to produce a 20 line program that half the ISO committee couldn't tell you if it should compile or not (they do this apparently for fun sometimes in Cpp-con talks).

To summarize, the point of stopping to add features is to have a stable language that is manageable. It will be around for many decades and not be thrown out. However, if it keeps evolving at the current pace, any new developer inheriting a large codebase will have to know 10 programming languages crammed into one and interacting in weird ways. If your answer is "what's wrong with that?", I guess we won't agree.

Question for you: Should they start adding variadic templates and mutable lambdas to C? Would you like that?

> Are you the one who is going to rewrite every single C++ software project into your flavor of the month?

Not wanting to rewrite code is the very reason they should stop adding features. Already, the standard suggestion in C++ circles is that you should "modernize" your codebase and use "best practices" that change every decade. That's bordering on rewriting it in a different language. I am against rewriting stuff that works in a new language. The sad thing is that I agree with pretty much all these "modernization" suggestions individually. The bigger picture of it seems crazy though.

New projects can be started in new languages that don't need to be backwards compatible to stuff from the 80s and have sane defaults with the benefit of hindsight. That's how languages evolve in my view, sometimes starting fresh is the way to go. The old languages stay around for a long time, which is fine. Incessantly messing with them and making them more complicated is the issue.

rewmie · on June 19, 2023

> Continuously "improving" a language has upsides and downsides.

It has zero downsides. If you are interested in new features you're free to adopt them. If you are not interested in them then you can simply not use them, or even sit out a specific version. There are zero drawbacks.

> A significant downside is the added complexity once there are very many features.

There is zero complexity. No one forces you to play standard library bingo. No one forces you to use each and every single feature ever devised. No one forces you to adopt a new standard.

That's the best feature of C++'s standardization process: it specifies existing features, and provides you with a convenient way to onboard or not onto them.

I've worked on C++14 projects until a couple of years ago. My team inherited a C++11 project and we decided to migrate to C++14 mainly because of std::make_unique. We considered C++17 but we didn't bothered with it. C++20 was already out. No one held a gun to our heads. It's ok if you don't jump onto the latest and greatest.

> Question for you: Should they start adding variadic templates and mutable lambdas to C? Would you like that?

C would greatly benefit from supporting templates. I don't understand what point you tried to make. The question sounds too luddite.

> Not wanting to rewrite code is the very reason they should stop adding features.

This is a pretty lame strawman. No one is forced to rewrite stuff when bumping a C++ version. I was involved in a couple of projects that underwent those upgrades and basically the migration consisted of flipping a switch in the build system and addressing a couple of compiler errors caused by the compiler flaggin a couple of warnings as errors. That was it.

Things are simpler if we don't invent problems.

pjmlp · on June 19, 2023

> Question for you: Should they start adding variadic templates and mutable lambdas to C? Would you like that?

Who knows what might still come into C26.

staunton · on June 19, 2023

Sure, who knows. But would you like new features in C?

pjmlp · on June 19, 2023

Yes, specially if they finally come around to implement fat pointers as proposed by Dennis Ritchie in 1990.

pavlov · on June 19, 2023

Many of these "new-feature enthusiasts" work at companies like Google and Meta where they have both enormous C++ codebases and dedicated teams that constantly work on internal libraries and tooling to make those codebases better. They will be using these features because they can mandate it.

Some big new C++ features reflect this reality, for example coroutines: it's not a feature you would use directly as an application developer, but it provides sufficiently flexible underpinnings that allow these FAANG teams (and other library developers of course) to integrate coroutines into their existing async libraries and start converting code over.

paulmooreparks · on June 19, 2023

Herb is working on something like that already. See his work on CppFront as an alternative syntax for C++: https://github.com/hsutter/cppfront

pjmlp · on June 19, 2023

C23 just got released with new features...

fractallyte · on June 19, 2023

No pictures of Varna?

That's a bit sad. It's a beautiful city, in places - the older buildings (neglected), the waterfront park that stretches KILOMETERS, one of the best puppet museums in Europe, and an incredible nature reserve parallel to the beach, out of town to the north...

svilen_dobrev · on June 19, 2023

come over, make your own :)

btw there's much-much more (unattended) forest+small-beaches to the south than to north.. and i have a few spare tents :)

https://svilendobrev.com/snimki/more/

That said.. very little software/tech stuff happens here, all goes to Sofia :( and people follow..

foonathan · on June 19, 2023

There are two reasons for that: Herb didn't attend in person and the meeting was in Golden Sands, not Varna...

einpoklum · on June 19, 2023

I am disappointed there is no progress on:

* Standardizing `__restrict__` (which C has, and C++ doesn't). Everybody relies on the compilers offering it as an extension, as without it - performance often suffers very badly.

* Universal call syntax (i.e. equivalence of `foo(myobj, myparam)` and `myobj.foo(myparam)` ). There were what I considered to be rather trivial objections, last decade, then it somehow went away.

kolbe · on June 19, 2023

I've found the lack of alias guarantees to be far more bark than bite. What's the primary use case? For automatic vectorization? Ultimately the only cost to check is an integer add and an integer comparison. Yeah, aliasing guarantees would eliminate those two ops (and eliminate generating the aliased version's code path). But in general, it isn't some 8x monumental cost that many people think it is.

einpoklum · on June 20, 2023

> What's the primary use case? For automatic vectorization?

No. Without aliasing guarantees, every load via a by-ref/by-addr input param after a store via another such param must actually be executed. So either you manually cache your accesses, or you can't do loop unrolling, strength reductions, and possibly other optimization work.

> Ultimately the only cost to check is an integer add and an integer comparison

1. Manually checking that will likely not affect the compiler. Again, you can implement your own optimizations, but that's not what you want to do.

2. The interest is also for the _definition_ side of things, not just for the implementation. I want to tell the user of my function "pass unaliased arguments". This is important, because otherwise you have to provide a bunch of convoluted implementations for the case of there _being_ aliasing - which often changes the semantics of the function to something you had no intention of writing.

3. In real life, there is no guarantee that memory ranges with disjoint addresses are not aliases of the same region. There's probably no such guarantee that the standard makes, but IANALL.

kolbe · on June 25, 2023

Sorry I didn't see your response.

> 1. Manually checking that will likely not affect the compiler. Again, you can implement your own optimizations, but that's not what you want to do.

I wasn't implying to do it manually. Compilers do it all the time.

2, I understand and would use. I also want to restrict input alignment.

3, I never knew, but compilers I use for x86 on linux seem to think it's fine to just check memory address distances.

soulbadguy · on June 19, 2023

I am very happy to see that the work on fibers is continuing. I was under the impression that with the addition of coroutines, the focus would have shifted. Glad i am wrong.

What's the best way for someone to keep track of the progress and the conversation around a specific proposal ?

jupp0r · on June 19, 2023

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/ has all the papers. To follow discussions, it's probably best to attend the standard meeting online or in person.

foonathan · on June 19, 2023

You subscribe to the Github issue of the proposal: https://github.com/cplusplus/papers/issues

blux · on June 19, 2023

Does anyone know what happened with Herb's idea for zero-overhead exceptions?

I remember liking the idea when he presented it at CppCon (https://www.youtube.com/watch?v=ARYP83yNAWk).

sigsev_251 · on June 19, 2023

Both the C++ paper and the variants of it for C are blocked from lacking implementation experience. The C papers had other flaws too and the initial paper (written by the lcc-win32 maintainer) was outright rejected. To be fair it's not a trivial thing to implement nor to specify.

omoikane · on June 19, 2023

P2558 should be "Add @, $, and ` to the basic character set", but somehow the article had % instead of $.

rightbyte · on June 19, 2023

I feel things with C++ are not as bad as with Python, but I would really like a "core language freeze" for 10 years now or something. I don't want to keep up with all these changes. To many enthusiasts on the committee.

ot · on June 19, 2023

It is a very fortunate coincidence that C++ reached the ideal feature set right around the time I learned it.

gkedzierski · on June 19, 2023

100% agree, C++98 was amazing :)

gpderetta · on June 19, 2023

I'm 99% sure that ot was being sarcastic.

gkedzierski · on June 19, 2023

I know, I wasn't though (but I know I'm in a minority)

lionkor · on June 19, 2023

No ownership semantics is a good thing?

maratc · on June 19, 2023

I'm in an even smaller minority and I think C++ was a monstrosity already in 99.

To use an analogy, it looked like a cat with 11 legs, even back then. Of course they added more legs since, and they keep on adding.

gumby · on June 19, 2023

You don’t have to, though over time some of them will start to appear in other peoples’ code.

A lot of features aren’t fully fleshed out when they enter the standard. For example the building blocks of coroutines appeared in C++20 but the user friendly affordances aren’t there yet — but with the building blocks in place people can work on that.

Likewise a lot of features are there for library developers and aren’t needed by regular code.

So your position is sorta consistent with what’s mostly approved by the committee. Even ranges, which are exciting, I’m still waiting on to see if some things get shaken out.

meribold · on June 19, 2023

Any problem with C++ can be solved by adding more features, except of course for the problem of too many features.

foota · on June 19, 2023

I for one welcome the new features! I don't think it's too hard to learn a few new things every 3 years.

rightbyte · on June 19, 2023

You have to think about newbies too. Python has made the same mistake as C++, but worse. At least C++ is not pretending to be user friendly.

"Modern" C++ was a mountain to climb for me. I think I will not even try to catch up on modules, contracts and what not. I'd rather go back to C for personal projects. I just don't have the stamina for another big C++ language change. And they add up.

MrJohz · on June 19, 2023

But newbies typically don't need to learn the whole mountain.

* Some new features are typically added to handle very specific cases that will rarely be encountered but can't be easily handled by existing mechanisms. These are typically used by library authors to make an API clearer or easier to use, and new developers won't need to interact with them at all.

* Some new features are added to replace existing features either partially or entirely (because the old feature had issues, or a better way of handling it was found). In that case, new developers just don't need to learn the old feature at all, or if they do, they can learn it later as a more advanced technique.

I don't know a huge amount of C++, and I haven't been following Python much recently, but to use some examples from Rust:

* GATs and a lot of the newer trait features are very much in the former category. Very few Rust developers will be directly using the new tools, but library authors can use them to create easier-to-use APIs which benefit everyone.

* The `try` mechanism is often simpler to learn than all the Result/Option combinators, which drastically reduces the amount a new developer needs to know about to get started. You'll still probably need those combinators later, but you can now get started without them.

There are definitely some features that don't fit clearly into these two categories, and add a kind of overall complexity to the language. But I think a lot of people look at a big list of features and think "but that will be so much to learn", forgetting that (a) you probably don't need to learn a lot of it until you're already very advanced, and (b) a lot of it will be making other parts of the language obsolete and overall simpler.

strus · on June 19, 2023

> In that case, new developers just don't need to learn the old feature at all

That's not true if you want to be a professional C++ dev - you will encounter projects stuck in older standards, or legacy code written in the age of old standard. In practice you need to know everything from C++98 to the newest.

59nadir · on June 19, 2023

In a non-trivial amount of cases I think this incessant adding of features to paper over old mistakes or fill gaps is actually counter-productive and turns into users having to learn several things instead of the intended situation where you end up just learning the new thing. I've done 7-8 years of C++ and there was definitely a moment where I could work on pretty much any C++ code base in the world at least somewhat comfortably, but I wouldn't take a C++ contract now and then ramp up to it because there's absolutely no telling which C++ this contract would entail and which landmines it has as a result.

Languages that are much more clear about what they support and what they don't support don't have this ambiguous language profile that constantly-appended-to languages have, because they've ended up taking a stance on what they actually do and as a result it's much easier to simply work with what's there. Rust is doomed to be in the former camp but luckily we have much more sensible alternatives to both C++ and Rust nowadays that look to be much more of a known quantity at some point.

Edit:

I also think it gives a language a special kind of funk where it's fairly obvious that certain things were just not all in all a great choice; you end up needing extra features that really only exist because you have other features and so on. `Pin`[0] in Rust comes to mind; it's something that has no intrinsic value and only exists to paper over the mistake of moving around memory magically.

0 - https://doc.rust-lang.org/std/pin/index.html

59nadir · on June 19, 2023

It's not only about newbies in the end. When I started learning C++ back in ~2000 the semi-joke was that "C++ takes about 10 years to learn". The language is far bigger now, with more things you either need to know to cut out or somehow just not learn, but then you need to learn it if you end up working with code that uses it, etc.. This stuff, to use properly, usually comes with a list of "dos and don'ts" in C++ so it's not feasible to just on-demand learn the additions to the language and call it a day.

NooneAtAll3 · on June 19, 2023

it feels like the problem isn't number of features, but quality of learning material...

what ways are there to learn/teach latest c++? I can only come up with conferences' recordings

nazgulsenpai · on June 19, 2023

While I don't follow Python development super closely, as a scripting language it is still pretty simple and user friendly. I'm sure once the complexity rises... well, the complexity rises.

rightbyte · on June 19, 2023

Ye sure. My point is that Python is harder than it could be. Not that it is harder to use or learn than C++. It is like the language design now is centered around those with 10 year plus experience in the language.

I firmly remember how hard it was to understand each concept as a newbie. And all the additions since I started out must be insanely confusing now.

nickelpro · on June 19, 2023

I don't understand this complaint even a little bit. Don't use features you don't want. If you want anything from C++ use just that.

No one is going to put a gun to your head and make you #include <concepts> when all you wanted was C with Classes.

meribold · on June 19, 2023

That only works as long as you're the sole developer working on a project. When enough programmers work on a project, each using their preferred subset of C++, the whole language ends up being used (including C). Refusing to learn features beyond one's preferred subset usually isn't a tenable position.

nickelpro · on June 20, 2023

More complaints I don't understand.

Of course you have to learn whatever language/tools/features your company uses. That's got nothing to do with C++. If the company doesn't want certain features to be used, it will ban them. Plenty of places have blanket bans on the STL.

maratc · on June 19, 2023

The main issue with C++ cannot be resolved by an addition of any new feature.

danq_ · on June 23, 2023

Arguably the features in themselves are the main issue with C++.

rewmie · on June 19, 2023

> I don't think it's too hard to learn a few new things every 3 years.

Do you think that in software development circles backwards incompatible changes only happen each 3 years?

riffraff · on June 19, 2023

What new language features did python get recently that make it harder for newbies?

rightbyte · on June 19, 2023

I can't think of any specifics that won't sound silly in isolation. It is just the sum.

I mean, it felt like I woke up one day and couldn't read Python anymore. I believe there is a great lag between introduction of a concept/syntax/whatever and its actual use, so I can't even pinpoint a version where Python became "too much".

riffraff · on June 19, 2023

thanks, I believe I get what you're saying, as a python outsider I just could not think of the changes :)

Typing syntax and pattern matching look pretty significant changes, but nothing else bug comes to mind. I guess ternary expressions and perhaps f-strings are new too but not very big

stathibus · on June 19, 2023

Lucky for us regular people, almost all of the features added after C++11 are completely ignorable, including everything in this trip report.

galkk · on June 19, 2023

"type checking format args" [1] is rather sad reading. I think that it's time to admit that C++ will never have things like string interpolation, that are norm in other modern languages.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...

vinkelhake · on June 19, 2023

What a weird comment. What would be the point of "admitting" something that is an obvious fact? std::format, being a function for string formatting, is already part of the C++ standard library and can do some of its error checking at compile time. The paper you link is about making that error checking even better.

galkk · on June 19, 2023

And that is the problem, from my pov. Instead of accepting reality that this part of c++ plain sucks and needs significant improvements, they try to do tactical updates.

There are 2 names of such activities, select one which you prefer: gold plating/turd polishing...

logicchains · on June 19, 2023

>c++ plain sucks and needs significant improvements

What a silly thing to say. How many other mainstream languages support writing a library for compile-time checking of format strings? Having the power to do this as a library means that you can also write similar stuff customised to your own needs when you need to, as it doesn't rely on anything hard-coded in the compiler.

tw1984 · on June 19, 2023

I really like the fact that it takes "decades" for any reasonable change (e.g. networking, fiber, reflection etc) in C++ to be approved, implemented and promoted.

It has never been a better time to depreciate such dinosaur language controlled by a small closed group of people with a combined age of thousands. It is a language for those with several million $ worth of RSU waiting to be vested.

rewmie · on June 19, 2023

> (...) any reasonable change (e.g. networking, fiber, reflection etc) (...)

Except those aren't small or simple or trivial or consequence-free.

Also, we're talking about standardizing current practices. You can do networking in C++ in a myriad of ways already, and none of which is standard.

tw1984 · on June 20, 2023

> and none of which is standard.

you don't see the problem here?

golang/rust doesn't spend 30 years to standardize networking/fs.

rewmie · on June 20, 2023

> golang/rust doesn't spend 30 years to standardize networking/fs.

Golang and rust standardized nothing.

Golang and rust just provided implementations for features. Likewise, there's already a myriad of networking/file system libraries for C++. This is not a problem. The problem is to get to an abstraction that can work across all conceivable platforms. You can hack together something, but throwaway code should not feature in an international standard. See for example how Python's fs support is riddled with gotchas and outright bugs on platforms such as Windows.

In the meantime C++ already has things like Poco, boost, Qt, etc, not to mention platform-specific SDKs. Not a problem. Never was.

tialaramex · on June 19, 2023

For vocabulary reasons it makes sense for the standard library to express basic types. That's why Rust's core::net::Ipv6Addr - sure, this $5 WiFi-enabled thermal probe and a $5000 100Gb/s fibre switch probably have no common elements when it comes to how they actually do networking, but it's worth acknowledging that 128-bit IPv6 addresses are actually the same thing in both codebases. There should not need to be some stupid adaptor layer, or the need to translate via strings.

Now, Rust's standard library also provides BSD sockets in std (that cheap WiFi probe presumably lacks an OS and wouldn't have sockets). That's much less obviously necessary, although it's hardly a big problem and I think if C++ had this since 1998 there would be less squabbling. But the fundamental types are vocabulary and it's silly that C++ doesn't have them in its stdlib.

lionkor · on June 19, 2023

Are you not aware of the networking TS and the asio / boost::asio library its based on? Those have asio::ip::(tcp|udp)::address, both v4 and v6 iirc. Not sure if I'm missing something?

tialaramex · on June 19, 2023

> asio::ip::(tcp|udp)::address

You mean asio::ip::address presumably. That's not shared vocabulary. Suppose I'm writing a small program maybe a couple of dozen lines of code, and I need some IPv4 addresses. Should I bring in all of ASIO to have this type?

It makes plenty of sense for ASIO to live outside of the C++ standard library. But the standard library would benefit from these basic types as vocabulary.

planede · on June 19, 2023

It would be quite weird to have these vocabulary types in the standard without any way to use them for networking within the standard library. Like having std::filesystem::path without having actual filesystem operations working on them.

lionkor · on June 19, 2023

When do you need ip addresses but no sockets?

tialaramex · on June 19, 2023

Sockets are an OS feature (initially on BSD) to make it simple to program with IP on Unix-like systems. They're not crucial to the Network, they're happenstance. Indeed in the early 1990s you'd have found pockets of people who weren't convinced sockets were even the Right Thing™ they wanted some other API to the same network. Same IP addresses, same services, same protocols, different API.

So there's no reason the very cheap device would care about sockets. But it does talk to the network, and for that it needs to know about addresses.

lionkor · on June 19, 2023

a socket is just a name for that abstraction, call it what you will.

tialaramex · on June 19, 2023

It's a very specific API, you don't need to use that API. QUIC implementations don't tend to have an equivalent for example, there's some of Connection data structure, but no need to give out integer handles if the implementation just lives in a library. It can seem weird today when we're used to sockets everywhere, but that's an arbitrary choice and there's no reason a tiny embedded system would mimic it.

planede · on June 19, 2023

AFAIK asio based networking TS is dead.

kps · on June 19, 2023

This one? https://doc.rust-lang.org/std/net/struct.Ipv6Addr.html

That's actually a very good example. Because sooner or later, you'll find yourself ten libraries deep with a FE80::1234 that you can't use because nine of the ten thought Ipv6Addr was enough.

tialaramex · on June 19, 2023

Kinda, yes. std::net::Ipv6Addr re-exports core::net::Ipv6Addr - it lives in core because you don't need an operating system to have network addresses.

If you wanted the [u8; 16] that's octets() if you wanted [u16; 8] that's segments() and those are both inline constant functions.