Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are the main differences of this architecture when compared with the Apache Spark ? Something that I see as a nice advantage is the Python -> LLVM IR, but I can't see what are the main advantages over Spark.


Makes most sense to compare Impala and Spark architecturally. Ibis will eventually integrate with Spark. We've been focusing on Impala integration for reasons cited here: http://blog.cloudera.com/blog/2015/07/getting-started-with-i...

In particular, we're working on byte-level shared-memory integration with Impala (which is implemented in C++ with LLVM runtime codegen — the project's tech lead, Marcel Kornacker, was the tech lead for Google F1's query engine) to run user-defined logic without data serialization / memory usage overhead. This also opens up Python's HPC / scientific computing stack and existing data libraries to be run in a Hadoop setting without Python-JVM interoperability issues.


Are you planning on leveraging numba, or will this be a new way to generate LLVM bytecode from python?


I was wondering this also


Now I got it, thanks for the explanation Wes, sounds very interesting indeed. Congratulations for the project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: