3
MLPops
2y

Why is it that so much of the Apache software for data is written in... Java?

I'm not a veteran Data Engineer, but I cannot imagine what makes Java better than Rust or Go

Comments
  • 11
    Java is much older than rust and go. Do you want them to rewrite a stable product in nonstable languages?
  • 0
    @aviophile fair point.

    But I see https://pola.rs ingesting millions of rows in a single machine just fine and it was written in Rust.

    This rant was a hot take at https://delta.io lake, and although it is not written in Java, it was written in a JVM lang, Scala.

    Numpy for example, so central to _every_ data science lib out there is writte in C.

    What I am complaining about, is the memory consumption of JVM. (I am uneducated about Scala though)
  • 2
    @aviophile Rust and Go are very stable… but of course tou wouldn’t rewrite existing products which functions neatly
  • 3
    Java's advantage over Rust at the time was that it existed. In order to use Rust, they'd've had to invent it.
  • 1
    @zlice Standard C is cross platform. That hasn't been a feat for 30 years. The advantage of Java is that (mostly) even the way it responds to developer fuckups is standard. In C that's only the case up to compilation because it always warns about deviance from the standard.
  • 0
    @zlice Well yeah, you need to use cross platform libraries, because syscalls inherently depend on the system.
  • 1
    Delta Lake was written in Scala because the it's used in Spark. The co-founders of Databricks (creators of Delta Lake) also were creators of Spark. Scala is a functional JVM language to make it easy for people who more than likely already have java installed to run programs without installing much else and to not have to learn an entirely new syntax. Also, most Java libraries can be used in Scala programs so it started with a large ecosystem from the start.
  • 0
    @cmarshall10450 I'll buy the ecosystem argument
  • 0
Add Comment