The believe that REPLs are based on interpreters is strange. The Lisp read-eval-print-loop has been using any mix of compiled and interpreted versions for several decades.
Something like ECL can compile Lisp incrementally to C, calls the C compiler and loads the compiled code into Lisp.
SBCL for a long time only had a compiler. The compiler then either compiles to memory or to files. These files can be loaded into Lisp. These system maintain much of the state between compiles.
There is zero reason why EVAL can't be implemented by direct-to-machine-code compiler - like in SBCL.
If one wants to execute whole programs, this is also what Lisp usually supports. One types to an editor and compile-and-load-buffer is a typical operation. This first compiles the buffer and then loads the code into the current Lisp. Sometimes output goes to temporary windows.
Some Lisp IDEs queue the editor buffer expressions into a REPL for execution and may compile them piece by piece.
There is usually very little use for modifying existing output for changes - though some implementations can do that - like Symbolics Genera, where output is recorded and deltas can computed to generate minimum updates. This can be used in custom applications, which have a command loop and an output recording window.
Usually this is not done, because both the internal and the external state is changing so much that in general updating the REPL output is useless. Might as well clear the existing command loop / repl output and generate fresh output.
This is a thing that interest me much (I'm building a relational lang in rust), I haven't found enough info or help (I have asked like a dozen times already!) in how do it with compiled languages (the answer is always "do read this large codebase that solve it)".
If I understand you well, this demand the compiler to be a JIT? And be responsible to emit and understand assembler?
Is possible to make a generalized solution, that for example, could work for a transpiler?
Is viable for a compiled language in multiple platforms or arches without a huge codebase?
These are not JIT compilers. They are AOT compilers. Code gets immediately compiled to machine code - either automatically or on direct request by the user.
But then what are the steps for make it real-able? I can see, like this project, the idea to compile everything and run, but the incremental aspect of a real is what I can't see how make it.
It's actually quite simple: all you need to do is to have your compiler be able to compile things smaller than the whole program. In Lisp, the unit of compilation is function (I think, lispm will correct me if I'm wrong), in Forth it's a word, in Prolog a rule and in CoffeeScript expression.
If you have that, the only thing that's left is being able to inject the compiled code into the running REPL process. This is more complicated nowadays because of restrictions placed on memory by the OS, ie. marking some segments of memory as executable and non-executable, but fundamentally compiled code is just a chunk of bytes, so "loading" here means allocating space for it and placing a pointer to the first byte anywhere reachable by the rest of the program, probably in some kind of a symbol table. The usual language rules take care of searching for the symbol's definition, then the implementation arranges the stack (or does anything else it wants with regards to arguments) and makes execution jump to the compiled code.
If you used GDB, you'd notice that it's possible to call natively compiled functions from its prompt, as well as to examine the symbols at the break-point (at least the ones not compiled away). GDB prompt is a REPL itself, although it only supports simple expressions IIRC, and the compilers of supported languages are written to work on the file level, not on the smaller chunks of code. But if you design a new language and implement a compiler for it, there's nothing stopping you from re-using compiler internals when writing a REPL.
The difference between JIT and AOT compilation is that with JIT you first execute the code, then compile it, while Ahead-of-Time (as the name implies) compiles code first and executes it later. What Lisp or Forth do is AOT compilation; for example, SBCL (IIRC) is a compiler-only implementation of Lisp, so it couldn't do JIT even if it wanted to, because there's no way for it to execute the code before it gets compiled. It - as mentioned - has a proper REPL and you'd be hard-pressed to find a difference in its look&feel compared to Python's.
BTW, writing an interpreter is very similar to writing a compiler: after lexing and parsing code into an AST you walk it, and then either perform required actions directly (that's an interpreter) or emit the code which will perform those actions when handed to a lower-level interpreter (like the CPU, or a VM). If you design your language from the beginning for it, you should be able to write both at the same time without much trouble. As an example, my favorite non-Racket Scheme implementation, Chicken Scheme, has both an interpreter and AOT compiler, with extensive support for dynamic behavior (like eval) in both compiled and interpreted code.
Anyway, this is my understanding, although I'm yet to write a compiler, so I'm not too sure about all the details, but the general outline should be more or less correct. I hope :)
EDIT: there was a book (which was not SICP) which taught writing interpreters and compilers and the relation between them, but I've been trying for half an hour now to recall what it was without success :( I should have the pdf somewhere, will try to dig it up later.
A shell that reruns the entirety of a compiled language each time you enter a line.
"This has obvious performance disadvantages if you're actually doing computation inside your REPL instance—for each new line, you'll have to redo all old computation. However, this technique has other distinct advantages that make it very desirable for certain use cases."
Are there any cleaver increment compile/run tricks that could speed this up?
Instead of putting all lines into one big chunk of code that is recompiled and rerun every time, one could instead compile it into a dynamic library with a single function that gets the current variable assignments passed as arguments, executes the single line of code and then waits for input to generate the next dynamic library, which it then loads and calls with the new variable assignment.
However, that solution would require the same level of language-specific knowledge as the modification mentioned below "Limitations" in the blog post. That's also why I never implemented the idea, despite having carried it around in my head for a few years now. Hopefully someone else will eventually get around to implementing it.
It's certainly possible, but another thing you can often do is just not depend on slow computations during interactive development. This is how I usually work on code in Python: with https://github.com/darius/halp I have repl-like examples in my source code, and Emacs reruns them all at a keystroke and tells you if any output changed; mostly it's no trouble to keep helpful examples that can all be rerun in a small fraction of a second.
IPython is fancier and can be used with expensive computations, so this might be a dead end, but I didn't want to have to think about whether the state of the examples was ever out of sync with a fresh reexecution of the code (and also I wrote this a long time ago). I thought I might get to cleverer incremental update methods eventually, but it hasn't been worth the trouble to try when the above workflow has worked well enough.
Yes, there are tools like rcrl (mentioned in the blog post) that dynamically link in new pieces of code [1]. The author of rcrl gave a nice talk at CppCon [2].
The trouble with tools like that is that they're pretty language specific, so it's my impression that it would require a fair bit of work to generalize to languages other than C++.
It's actually pretty easy to detect multi-line inputs as long as they have clear enclosers (like '{}' for a block in C/C++). This is what reple does.
This doesn't require a parser--you just have to know what the enclosers are (one of the things that goes in a reple config file) and then count them to see if they're matched or not.
For languages without Python-like syntax or with enclosers that can be ambiguous (some dialects of Fortran), this would be pretty difficult.
OK for some use cases, but not mine. I often run long running computations in repos, and like to be able to add/modify code. A replay based repl would not work at all for me.
Something like ECL can compile Lisp incrementally to C, calls the C compiler and loads the compiled code into Lisp.
SBCL for a long time only had a compiler. The compiler then either compiles to memory or to files. These files can be loaded into Lisp. These system maintain much of the state between compiles.
There is zero reason why EVAL can't be implemented by direct-to-machine-code compiler - like in SBCL.
The SBCL Read-Eval-Print-Loop:
If one wants to execute whole programs, this is also what Lisp usually supports. One types to an editor and compile-and-load-buffer is a typical operation. This first compiles the buffer and then loads the code into the current Lisp. Sometimes output goes to temporary windows.Some Lisp IDEs queue the editor buffer expressions into a REPL for execution and may compile them piece by piece.
There is usually very little use for modifying existing output for changes - though some implementations can do that - like Symbolics Genera, where output is recorded and deltas can computed to generate minimum updates. This can be used in custom applications, which have a command loop and an output recording window.
Usually this is not done, because both the internal and the external state is changing so much that in general updating the REPL output is useless. Might as well clear the existing command loop / repl output and generate fresh output.