Programming Languages

There are many ways to categorize programming languages. For the purposes of this lesson, the focus is on how a user interacts with the tools for a language from a workflow viewpoint. As such, the programming languages will be split into three categories: interpreted, compiled, and a hybrid of the two.

There are a few steps that code of any language has to go through before being run by a machine:

Lexical analysis: the process of converting a sequence of characters into a sequence of tokens
Syntax analysis: the process of ensuring that the lexed tokens conform to the rules of the language
Semantic analysis: the process of ensuring that the relationships between the lexed tokens are valid.

Interpreted Languages

Once semantic analysis is complete, interpreted languages immediately execute your code. This usually allows them to have an interactive interpreter and makes them friendly to development, where repeatedly tweaking then testing code needs to be fast and easy.

Compiled Languages

Compilers take a couple more steps after semantic analysis:

Optimization: the process of minimizing the expected runtime and/or memory footprint of the resulting program
Code generation: converting the intermediate representation resulting from optimization to machine code for a specific platform

As an example, the following C++ code:

int square(int n) {
    return n * n;
}

…might be transformed into the following machine code by the compiler without optimization:

square(int):
   pushq      %rbp           ; rbp: bottom of frame pointer
   movq       %rsp, %rbp     ; rsp: top of frame pointer
   movl       %edi, -4(%rbp) ; edi: general purpose register
   movl       -4(%rbp), %eax ; eax: general-purpose accumulator
   imull      -4(%rbp), %eax
   popq       %rbp
   ret

If optimization is turned on, the compiler will realize that the same thing can be accomplished with less redundant instructions–rather than directly translating your code, it will emit machine code that will have identical results in as optimal a way as it can find. This has less than half of the instructions, but will give the same answer:

square(int):
   movl       %edi, %eax
   imull      %edi, %eax
   ret

Since compilation only has to happen once, workflows with one compilation and many executions favor compiled languages since execution of compiled code is usually faster than running an interpreter.

Which is Better?

Interpreted Language Advantages

Programs can be evaluated immediately–code does not need to be recompiled after each change
Generally more portable between platforms and operating systems
Usually more user-friendly; if developer time is more valuable than compute time, an interpreted language is usually a better choice

Compiled Language Advantages

Many errors that an interpreted language would find while running can be caught at compile time
Low level programming (e.g. GPU kernels) is possible
Usually faster, often much faster; if compute time is more valuable than development time, a compiled language is usually the way to go

HPC, Hybrid Languages, and Language Choice

Since computation time usually dwarfs developer time in the context of high performance computing, compiled languages are favored for writing programs for supercomputers; C++ and Fortran are very popular, which is partly why much of this course is taught in C++.

Developers with less programming experience will often choose an “easier,” interpreted language for use on the supercomputer. This can be an acceptable choice assuming that the interpreted language is used as glue code, calling optimized compiled code for heavy computation, but is generally a bad choice otherwise. For example, a Python program whose main computation is done with vectorized numpy is okay, while an R program without special tuning is not.

Julia is a hybrid language–a language that combines some characteristics from interpreted languages and some from compiled languages–that was designed from the ground up for HPC. It’s easy to use, more expressive than Python (and far more expressive than R), and fast due to its just-in-time compilation, type stability, and multiple dispatch. We thus encourage most users of the supercomputer who aren’t tied to legacy code or a language-specific library to write new code for the supercomputer in Julia. In this class, we’ll use Julia to teach concepts that are hard to learn with C++.