I have re-tested on my skylake notebook and updated the blog. It confirms darn old CPU I use as my benchmark machine. Maybe it is bit more sensitive to the difference which is expected for non-server CPU.
GCC does "almost full LTO" with partitioning, while clang does thinLTO that does make most of code size/speed tradeoffs without considering whole program context, so it may be interesting to get both alternatives closer in code size/performance metrics.
I have got Firefox developer account of level1 and I am looking into official benchmarking architecture which I have now updated to GCC 8 with LTO+PGO.
GCC does "almost full LTO" with partitioning, while clang does thinLTO that does make most of code size/speed tradeoffs without considering whole program context, so it may be interesting to get both alternatives closer in code size/performance metrics.
I have got Firefox developer account of level1 and I am looking into official benchmarking architecture which I have now updated to GCC 8 with LTO+PGO.