GCC has had link time optimization (LTO) for quite a while now. Instead of generating just assembly code, it streams intermediate representation (IR) for the translation unit to the object file. At link time, when you provide all the object files necessary to link into the ELF, the compiler gets to see IR from all the translation units together, and this lets it perform optimizations across translation units. All you have to do is add -flto to the compiler and linker invocations and you’re done.
If you’re using GCC’s LTO feature, make sure you repeat the compiler flags when linking. So, if you compile code with -Os -ffunction-sections, make sure you pass the same flags to the gcc driver when linking.
This is needed because LTO effectively recompiles by streaming in IR from the object files and regenerating code. Only this time it knows about the code in all the object files that’s going to be part of the final executable, so it can perform a bunch of optimizations it previously couldn’t.