Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Reple: “Replay-Based” REPLs for Compiled Languages (eecs.berkeley.edu)
68 points by xiii1408 on Dec 23, 2018 | hide | past | favorite | 17 comments


The believe that REPLs are based on interpreters is strange. The Lisp read-eval-print-loop has been using any mix of compiled and interpreted versions for several decades.

Something like ECL can compile Lisp incrementally to C, calls the C compiler and loads the compiled code into Lisp.

SBCL for a long time only had a compiler. The compiler then either compiles to memory or to files. These files can be loaded into Lisp. These system maintain much of the state between compiles.

There is zero reason why EVAL can't be implemented by direct-to-machine-code compiler - like in SBCL.

The SBCL Read-Eval-Print-Loop:

    * (flet ((my-function (a)
               (+ a 42)))
        (disassemble (function my-function))
        (my-function 0))

    ; disassembly for (FLET MY-FUNCTION)
    ; Size: 37 bytes. Origin: #x1002FC5E98
    ; 98:       498B4D60         MOV RCX, [R13+96]                ; no-arg-parsing entry point
                                                                  ; thread.binding-stack-pointer
    ; 9C:       48894DF0         MOV [RBP-16], RCX
    ; A0:       48895DE8         MOV [RBP-24], RBX
    ; A4:       BF54000000       MOV EDI, 84
    ; A9:       488BD3           MOV RDX, RBX
    ; AC:       FF1425A000B021   CALL QWORD PTR [#x21B000A0]      ; GENERIC-+
    ; B3:       488B5DE8         MOV RBX, [RBP-24]
    ; B7:       488BE5           MOV RSP, RBP
    ; BA:       F8               CLC
    ; BB:       5D               POP RBP
    ; BC:       C3               RET
    42
If one wants to execute whole programs, this is also what Lisp usually supports. One types to an editor and compile-and-load-buffer is a typical operation. This first compiles the buffer and then loads the code into the current Lisp. Sometimes output goes to temporary windows.

Some Lisp IDEs queue the editor buffer expressions into a REPL for execution and may compile them piece by piece.

There is usually very little use for modifying existing output for changes - though some implementations can do that - like Symbolics Genera, where output is recorded and deltas can computed to generate minimum updates. This can be used in custom applications, which have a command loop and an output recording window.

Usually this is not done, because both the internal and the external state is changing so much that in general updating the REPL output is useless. Might as well clear the existing command loop / repl output and generate fresh output.


This is a thing that interest me much (I'm building a relational lang in rust), I haven't found enough info or help (I have asked like a dozen times already!) in how do it with compiled languages (the answer is always "do read this large codebase that solve it)".

If I understand you well, this demand the compiler to be a JIT? And be responsible to emit and understand assembler?

Is possible to make a generalized solution, that for example, could work for a transpiler?

Is viable for a compiled language in multiple platforms or arches without a huge codebase?

What could be a minimal example that could work?


These are not JIT compilers. They are AOT compilers. Code gets immediately compiled to machine code - either automatically or on direct request by the user.


But then what are the steps for make it real-able? I can see, like this project, the idea to compile everything and run, but the incremental aspect of a real is what I can't see how make it.


It's actually quite simple: all you need to do is to have your compiler be able to compile things smaller than the whole program. In Lisp, the unit of compilation is function (I think, lispm will correct me if I'm wrong), in Forth it's a word, in Prolog a rule and in CoffeeScript expression.

If you have that, the only thing that's left is being able to inject the compiled code into the running REPL process. This is more complicated nowadays because of restrictions placed on memory by the OS, ie. marking some segments of memory as executable and non-executable, but fundamentally compiled code is just a chunk of bytes, so "loading" here means allocating space for it and placing a pointer to the first byte anywhere reachable by the rest of the program, probably in some kind of a symbol table. The usual language rules take care of searching for the symbol's definition, then the implementation arranges the stack (or does anything else it wants with regards to arguments) and makes execution jump to the compiled code.

If you used GDB, you'd notice that it's possible to call natively compiled functions from its prompt, as well as to examine the symbols at the break-point (at least the ones not compiled away). GDB prompt is a REPL itself, although it only supports simple expressions IIRC, and the compilers of supported languages are written to work on the file level, not on the smaller chunks of code. But if you design a new language and implement a compiler for it, there's nothing stopping you from re-using compiler internals when writing a REPL.

The difference between JIT and AOT compilation is that with JIT you first execute the code, then compile it, while Ahead-of-Time (as the name implies) compiles code first and executes it later. What Lisp or Forth do is AOT compilation; for example, SBCL (IIRC) is a compiler-only implementation of Lisp, so it couldn't do JIT even if it wanted to, because there's no way for it to execute the code before it gets compiled. It - as mentioned - has a proper REPL and you'd be hard-pressed to find a difference in its look&feel compared to Python's.

BTW, writing an interpreter is very similar to writing a compiler: after lexing and parsing code into an AST you walk it, and then either perform required actions directly (that's an interpreter) or emit the code which will perform those actions when handed to a lower-level interpreter (like the CPU, or a VM). If you design your language from the beginning for it, you should be able to write both at the same time without much trouble. As an example, my favorite non-Racket Scheme implementation, Chicken Scheme, has both an interpreter and AOT compiler, with extensive support for dynamic behavior (like eval) in both compiled and interpreted code.

Anyway, this is my understanding, although I'm yet to write a compiler, so I'm not too sure about all the details, but the general outline should be more or less correct. I hope :)

EDIT: there was a book (which was not SICP) which taught writing interpreters and compilers and the relation between them, but I've been trying for half an hour now to recall what it was without success :( I should have the pdf somewhere, will try to dig it up later.


> If you have that, the only thing that's left is being able to inject the compiled code into the running REPL process.

This is alike JIT, where you MMAP and the mark that as executable?

Now thinking a little, could be easier using DLLs/dynamic linking, but how model the API is something I can't yet figure.

P.D: I think TinyCC is able to compile on memory, but I'm using rust for my project and probably the gymnastics of this are too much for now.


A shell that reruns the entirety of a compiled language each time you enter a line.

"This has obvious performance disadvantages if you're actually doing computation inside your REPL instance—for each new line, you'll have to redo all old computation. However, this technique has other distinct advantages that make it very desirable for certain use cases."

Are there any cleaver increment compile/run tricks that could speed this up?


Instead of putting all lines into one big chunk of code that is recompiled and rerun every time, one could instead compile it into a dynamic library with a single function that gets the current variable assignments passed as arguments, executes the single line of code and then waits for input to generate the next dynamic library, which it then loads and calls with the new variable assignment.

However, that solution would require the same level of language-specific knowledge as the modification mentioned below "Limitations" in the blog post. That's also why I never implemented the idea, despite having carried it around in my head for a few years now. Hopefully someone else will eventually get around to implementing it.


This approach is taken by https://github.com/dlang-community/drepl and https://docs.scala-lang.org/overviews/repl/overview.html. While it works surprisingly well, you need to parse and transform the REPL input to emulate some language constructs in the right scope.


This is how http://neugierig.org/software/c-repl/ worked. (See "How it works" in the README: https://github.com/evmar/c-repl/blob/master/README )


I made use of ANTLR4 for parsing C to write a "worksheet" (like Swift's Playgrounds) tool: https://github.com/rgoulter/c-worksheet-instrumentor

It's an example of what you can do with "specific to a language", although it'd suffer the same limitation of "slower after a big computation".


Could it be done in llvm or another backend so all frontend languages can benefit?


It's certainly possible, but another thing you can often do is just not depend on slow computations during interactive development. This is how I usually work on code in Python: with https://github.com/darius/halp I have repl-like examples in my source code, and Emacs reruns them all at a keystroke and tells you if any output changed; mostly it's no trouble to keep helpful examples that can all be rerun in a small fraction of a second.

IPython is fancier and can be used with expensive computations, so this might be a dead end, but I didn't want to have to think about whether the state of the examples was ever out of sync with a fresh reexecution of the code (and also I wrote this a long time ago). I thought I might get to cleverer incremental update methods eventually, but it hasn't been worth the trouble to try when the above workflow has worked well enough.


Yes, there are tools like rcrl (mentioned in the blog post) that dynamically link in new pieces of code [1]. The author of rcrl gave a nice talk at CppCon [2].

The trouble with tools like that is that they're pretty language specific, so it's my impression that it would require a fair bit of work to generalize to languages other than C++.

[1] https://github.com/onqtam/rcrl

[2] https://www.youtube.com/watch?v=UEuA0yuw_O0


In terms of "value for effort", this sounds like a cool idea.

I'd guess this approach also struggles with multiple-line input like if/else, loops, functions.

I'm reminded of https://github.com/jcartledge/sublime-worksheet which is language-agnostic on the other end of things. The two ideas might work well together in limited cases.


It's actually pretty easy to detect multi-line inputs as long as they have clear enclosers (like '{}' for a block in C/C++). This is what reple does.

This doesn't require a parser--you just have to know what the enclosers are (one of the things that goes in a reple config file) and then count them to see if they're matched or not.

For languages without Python-like syntax or with enclosers that can be ambiguous (some dialects of Fortran), this would be pretty difficult.


OK for some use cases, but not mine. I often run long running computations in repos, and like to be able to add/modify code. A replay based repl would not work at all for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: