21

So recently I did a lot of research into the internals of Computers and CPUs.
And i'd like to share a result of mine.

First of all, take some time to look at the code down below. You see two assembler codes and two command lines.

The Assembler code is designed to test how the instructions "enter" and "leave" compare to manually doing what they are shortened to.

Enter and leave create a new Stackframe: this means, that they create a new temporary stack. The stack is where local variables are put to by the compiler. On the right side, you can see how I create my own stack by using

push rbp
mov rbp, rsp
sub rsp, 0

(I won't get into details behind why that works).

Okay. Why is this even relevant?
Well: there is the assumption that enter and leave are very slow. This is due to raw numbers:
In some paper I saw ( I couldn't find the link, i'm sorry), enter was said to use up 12 CPU cycles, while the manual stacking would require 3 (push + mov + sub => 1 + 1 + 1).

When I compile an empty function, I get pretty much what you'd expect just from the raw numbers of CPU cycles.

HOWEVER, then I add the dummy code in the middle:

mov eax, 123
add eax, 123543
mov ebx, 234
div ebx

and magically - both sides have the same result.

Why????

For one thing, there is CPU prefetching. This is the CPU loading in ram before its done executing the current instruction (this is how anti-debugger code works, btw. Might make another rant on that). Then there is the fact that the CPU usually starts work on the next instruction while the current instruction is processing IFF the register currently involved isnt involved in the next instruction (that would cause a lot of synchronisation problems). Now notice, that the CPU can't do any of that when manually entering and leaving. It can only start doing the mov eax, 1234 while performing the sub rsp, 0.

----------------

NOW: notice that the code on the right didn't take any precautions like making sure that the stack is big enough. If you sub too much stack at once, the stack will be exhausted, thats what we call a stack overflow. enter implements checks for that, and emits an interrupt if there is a SO (take this with a grain of salt, I couldn't find a resource backing this up). There are another type of checks I don't fully get (stack level checks) so I'd rather not make a fool of myself by writing about them.

Because of all those reasons I think that compilers should start using enter and leave again.

========

This post showed very well that bare numbers can often mislead.

Comments
  • 4
  • 2
    >For one thing, there is CPU prefetching

    the reason why the CPU can't prefetch (as much) here is because the CPU already is holding the next instructions of the manual stack.
  • 4
    Dear oh dear me, a fellow traveller on the abyss of the processor?

    I need some time to look into this, looks interesting. Still, compiler design? That is some forbidden black art there. It will take some time, my assembly is a LOT rusty. Had to quit for some time, until the headaches and the rest of the symptoms disappear.
  • 2
    @bladedemon i'm not designing a new compiler, but I'm saying that most (modern) C++ compilers should use this
  • 4
    @gnulinuxer4fun Maybe you can try it in one of the gazillion flags gcc has, or try to tamper with the code. Still, the compiler tends to optimize a lot, so maybe it can do this if you flip the right switch? I don't know, seems like something I can spend some time on. :)
  • 0
    @bladedemon Maybe gcc has that switch. But i'm a lazy man and i'll not look through gcc switches. To me gcc is dead since it lost me 4 days of osdev because it didnt compile sane output and optimized some jumps away that my RIP end up in the 0x8b000 buffer (which is the vga buffer)
  • 1
    @dudeking I saw :D

    6502 is pretty old though

    I'd suspect that 6502 doesnt do as much of prefetching and also doesnt preprocess this efficiently
  • 1
    @dudeking bro, where are them screenshots?
  • 1
    @FrodoSwaggins what?

    I mean yea, but I don't see the relation with my rant
  • 0
    @FrodoSwaggins i'm not talking about speculative prefetches (as in speculative branche prefetching) but about litteral prefetching. Theres no inconvenience in prefetching
  • 0
    @FrodoSwaggins thanks *blushes*
  • 1
    >literal pre-fetching
    I think you mean pipelining?
  • 0
    @beegC0de I think so. what is pipelining in your understanding?
  • 1
    @gnulinuxer4fun your description of fetching the next instruction at the same time as the current is executing was taught as "pipelining" to me.

    @FrodoSwaggins I don't doubt it but I don't think op is talking about that kind of prefetching :P
  • 1
    @FrodoSwaggins I'm only going off what I learned in my SPARC class, though I assumed most architecture pipeline fetch-execute cycles in this way.

    Is this crazier pre fetching more on the line of compiler optimizations? I'm not that familiar with these topics but would like to learn something cool today!
  • 1
    Pipelining is to divide instructions into single steps (decode instruction access memory, write register etc.). Instead of waiting for the whole instruction to be completed you can decode the instruction of the next instruction as soon the decode of the prior execution finished. In the best case you can compute different steps of sequential instructions in parallel. It increases the utilisation of the processor units.
  • 0
    @No3x @FrodoSwaggins @beegC0de

    Right, but here we are discussing nomenclature. not very important, IMHO ;-)

    anyway, I'd love to talk to you in a more "conversation - like" talk. do you guys have telegram?

    i'm @BinaryByter on there.
  • 1
    @gnulinuxer4fun haha didn't mean to derail, just thought some might know the concept better as pipelining :P
  • 0
    @beegC0de hehehe don't worry <=(-:
  • 1
    Pretty amazing post
  • 1
Add Comment