External documentation and training resources

Alice, “Well, in our country, you’d generally get to somewhere else—if you ran very fast for a long time, as we’ve been doing.” The Red Queen, “A slow sort of country! Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!” Sequel to Alice’s Adventures in Wonderland, Through the Looking-Glass by Lewis Carroll.

In this chapter, we provide you some material to help you work efficiently with the hardware offered by Adastra.

Software engineering

As the amount of scientist man power grows fairly linearly over time, one could say that the complexity of coding the physics they want to simulate increases exponentially. Indeed, they may mix and match more and more physical phenomena, and it becomes difficult to manage the complexity of the code. This section presents some resources in the hope that research teams maintaining a code base will be able to better allocate manpower.

Although one could argue that a beautiful language is one in which the programmer can easily express correct (and efficient) code, but always remember that your programming language will not save you. Indeed, in time, everything changes, but the basics.

Some believe that making simple things complicated is a sign for a lack of talent or at least, may be due to a serious lack of understanding of what you are doing. In fact, one needs a sense of taste (in the aesthetic sense) when debugging, programming, and generally, for being good at something.

Algorithm and generic programming

Before evoking the theory, remember that you run a program on a platform (OS and hardware) and that the goal of your program is just to transform data. Knowing the platform and the data transformation you have a (practical) problem. Data is and should stay the site of attention, whatever your program does. You must adapt your data (layouts) to match the problem.

Now, start with the basic, use optimal single threaded algorithm and good data structures, The Art of Computer Programming, Knuth, Vol. 1 ISBN-10: 0201896834 is a thorough start. This book (Vol. 1) or series of books (Vol. 1-4B) can serve as a reference for many algorithmic tricks.

Elements of Programming, Stepanov; McJones, ISBN-10: 0578222140 available in PDF. This book is more practical and focuses on writing generic code which rely on the properties of the type. Types with common subset of properties can be subject to a similar treatment. The book is a must have (and is cheap). You may also watch a few parts of the video courses dispensed at Amazon by Alexander Stepanov and available on Youtube (part 1.1, part 1.2, part 1.3, part 2, part 3 and part 4).

A book providing similar insights to Element of Programming but in a simpler to understand package is From Mathematics to Generic Programming, Stepanov, Alexander; Rose, Daniel, ISBN-10: 0321942043.

The history of generic programming is presented by Sean Parent in his talk, Generic Programming.

As an aside, Dijkstra’s note on ranges points to some significant design flaws in some programming languages.

Code quality

Is complexification the root of the end of civilizations ? Keep it simple.

Note

Be leery of following patterns without thoughts, this apply particularly well to this subsection.

Code quality is often set aside in the HPC world which is strange; who would trust the results of a messy, poorly understood, rarely thoroughly tested and bug or undefined behavior ridden toolkit. Writing quality lines of code is a job in itself but it may be that it is often disregarded as such, unless you face a maintainability scalability wall. We have seen an increasing amount of team facing porting issues or even considering rewriting their tools due to unforeseen architectural or maintainability issues.

Code quality may seem a bit abstract and in fact it is very situational, but it is real to the point where a programmer with a minimum of experience should be able to detect what is called a code smell. As you write the code, you are yourself a client of this code, because you read it again, potentially you maintain it and its quality influences your ability to improve it. The quality of the code is therefore crucial and some believe (from experience) that 50-60% of the time should be spent proofreading, re-factoring, correcting typos, formatting the code (maybe fixing bugs) rather than extending it with new features. This is the least you can do to avoid software rot. We would like to point out that even if HPC is very much focused on computational performance, we should not forget about optimizing the readability of the software. This may need cultural, money and/or management changes. There is no such thing as my code and his code, there is the code. If you come across a problem, it is better to correct it directly; don’t go straight to writing a bug report, because if you leave the error now, you will lower the bar for the quality of the code, which, from experience, leads to a negative feedback loop that will degrade it further more.

You may want to always test using the latest, cutting edge stuff but we recommend that you deploy only using the stable. Taking example with the C or C++ versions, test some of C++ 20 or C23/C++23 features but, do not rely on them and only deploy using C11/C++14 or C17/C++17. The rational is that there is too many unknown, unwritten best practices, non understood trick with the latest.

Tools and idioms

Gurus and programming veterans already walked a painful path and you should follow in their steps. Do not reinvent the wheel (unless it serves as learning experience). Specify a set of rules and stick to them. Respect the common idioms of the language. A programming language is a tool, so spend time properly learning how to use it properly as you would for other tools.

One such set of rules is given here in the case of the C++ language. While not ideal, these rules give structure to the code and facilitate human code parsing (i.e. readability). Regarding the idioms, in the case of C++ two of the most famous front facing comity member, Bjarne Stroustrup and Herb Sutter offer the C++ Core Guidelines.

There are similar guidelines for the Fortran language, but they are much more scarce (a cultural issue ?).

One should also spend time studying tools like Clang-format, Clang-tidy, the editor config file and seek absolute homogeneity in all the code base.

Compiling code

Going top bottom, the next step after defining guidelines is choosing an appropriate way to compile and expose your software for reuse. To achieve that, we use build scripts executed by build systems which are simple programs dictating amongst other things how the compiler should be called and how the compilation output (object files) should be linked. Example of tools providing such capabilities are Make and Ninja. Some build scripts are not intended to be written by humans. While Make scripts can be fairly readable and maintainable, ninja’s are readable (arguably more than Make’s) but unmaintainable.

Writing such a build scripts may tie you to a build system and potentially to a platform. It also requires careful (explicit ?) dependency management, may complicate installation or packaging procedures and may require a low level understanding of the supported compiler. One advantage is that you are supposed to know exactly how it works, you coded it after all. In practice many low level build script are messy and poorly treated. Writing these script should be taken with as much, if not more consideration than the code itself. Treat your build automation scripts as code, with stylistic and semantic rules.

Some tools have been developed to abstract the build step from the system it targets. CMake was conceived to satisfy such cross-platform needs. Nowadays it is widely used by large C++ code bases. It itself does not target C++ exclusively, C and Fortran support is strong. CMake will generate many kind of build script at your leisure. It can produce scripts for Make, Ninja, VS Studio and more. CMake can ease dependency management by abstracting the use of specific compiler flag used to include headers or link against libraries, abstracting dependencies (say header visibility propagation), testing your code and more.

Organizing code

Complexity is unavoidable as your code base grows but you should try to avoid convoluted designs without added value. Try to encapsulate code into namespaces, libraries and obviously, functions such that these bundle of functionalities interact through well defined interfaces.

Creating an interface is a delicate job that requires artful design. The best method is probably to test and try with an obviously big advantage if previous knowledge can be factored into the design. Try to keep Hyrum’s law in mind, though its significance depend on how your users care about you not breaking the contract.

Contract, Interface, API: But what are these? An Application Programming Interface (API) can be seen as the instantiation, in code, of a contract. We talk of contract because indeed, there is an agreement between, in the case of computer programming, a user and a provider. What form can it take? Many, many forms, for instance what you give to MPI_Send, what MPI_Send returns, how does MPI_Send fails, what other effects MPI_Send has on the global state of the program, how you use malloc and free, how you call the grep command, using which protocol you exchange with your typical web server. Finally, and as Hyrum’s law reminds us, the interface is every observable behavior of, in our case, a piece of code.

Who is enforcing the contract? Sometimes, the context, the language or the library. For instance in Java, if you do an out of bound access in an array, you are promptly notified. In some cases, nobody tells you and you expose yourself to liability, say undefined or unspecified (implementation specific) behavior (see The C and C++ languages).

Generally, Keeping the interface (It) Simple (and Stupid) (KISS) is a good advice, though you should never remove necessary complexity (do not make it simpler or stupider than it need to be). Another view would be to try to use the weakest (least powerful) tool that satisfies the need or to eliminate needless diversity wherever possible (this assumes you know what you need, potentially months or years in advance). Furthermore, what kind of software would you like to work on: software so complex that the bugs aren’t obvious, or software so simple that the bugs are obvious and so there are obviously no bugs.

C, the language of the birds or the poor man’s software glue

“[…] the language of the birds is postulated as a mystical, perfect divine language, Adamic language, Enochian, angelic language or a mythical or magical language used by birds to communicate with the initiated”, Language of the birds, Wikipedia.

Interfaces take many forms, though in our case we shall focus on the use of C to provide stable interfaces between languages. The C language is the glue of computer science because it’s ubiquitous, low-level, stable (in time) and simple. (That said, maybe the glue is getting old.) By stable, we mean the Application Binary Interface (ABI) is stable. For instance, the way you pass arguments in registers or on the stack before calling, they way you name your symbols (the string that marks, say, your global variables and your functions in object files), that, almost never changes for a given platform.

As an example, say you recompiled your code using a different Fortran or C++ standard (C++03, C++17 etc.) but you kept the same compiler or that you changed compiler, know that the interface could change (and in certain cases, will change). The ABI changed to be precise, and indeed, the representation of the symbols may have changed and thus, so does the interface of your library, even though the service your libraries provides probably hasn’t changed. In the C language, symbols are very likely not to change for a given platform.

What was subject to changes between compiler or language version in the example above is the symbol naming scheme which is called mangling. This is a workaround to linkers not being aware of the languages specific features. Say C’s calling convention, C++’s namespaces or overloading and Fortran’s modules. The language specificities get mangled into a string which will become the symbol. This used to be one of reason as to why building C++ from source using an unique compiler was often a relief, you get strong guarantees, that you will not mix multiple mangling schemes.

For obvious reasons, the C mangling is very much tied to the platform, else you can not talk with the OS anymore. So the issue comes from the fact that on a similar platform, compilers can emit different symbols from a similar code (though that tends to change, as everything gets LLVM based nowadays, and people want to use shared libraries). As a side note, the C language is popular with most large OS.

Anyway, because everybody talk C, it is a good candidate to talk with other programming languages. You will find C bindings for Python, Go, Fortran ISO C (bind(c)), Java, etc.. These bindings are just wrappers in language C around a code in language L0, themselves often used in language L1. If you think about it, it may remind you of a funnel or of an hourglass (see Other resources).

Note

Inside a program using a fixed language (say only C++), ABI concerns can be reduced as sources are often compiled using a single compiler and as such, the code is conceived by the compiler so it can talk to itself.

Warning

Assuming the following C functions declarations void func_a(); and void func_a(int);. Note that both map to the same func_a symbol.

Finally, there are some symbol visibility issues that you may encounter with shared libraries. We do not go down this rabbit hole but provide additional documents here. (TLDR; use the following flags -shared -fPIC -fvisibility=hidden -Bsymbolic when building shared libraries).

Example

You have a Fortran code named LA using a C++ library LB itself using an other C++ library LC. What can you do? First, between Fortran and C++, you will have to rely on things like the Fortran 2003’s ISO C binding constants and a C99 interface. For the C++ code, either you can guarantee that both LB and LC get compiled by the same compiler or at least, that different compilers use a similar symbol naming scheme. Or, you wrap the services provided by library LC in a C99 interface.

Semantic versioning

When your code reaches a state at which it contains enough changes that it would be worthwhile for the users to upgrade or at least to publish, one can do a version bump. The idea with semantic versioning is to provide through the version, information about how the interface evolves:

  1. MAJOR version when you make incompatible API changes;

  2. MINOR version when you add functionality in a backwards compatible manner;

  3. PATCH version when you make backwards compatible bug fixes.

You then concatenate MAJOR.MINOR.PATCH to form a version tag which you could use to mark your Git commits and release tarballs.

Reading the semver.org FAQ should provide you with the appropriate insights regarding software interface management and complexities justifying the role of a norm for version labeling.

Other resources

Some CMake guidelines are given by Manuel Binna in Effective Modern CMake.

Stefanus DuToit presents some aspects of the Hourglass concept in Hourglass Interfaces for C++ APIs.

Organize your code in library, this is common, but often wrongly done. Some details are given by Ulrich Drepper in How To Write Shared Libraries.

GCC’s documentation on symbol visibility is an important read when dealing with libraries. Know that CMake provides tools to produce symbol visibility helper macros.

Some of Linux’s dynamic libraries implementation issues are discussed in Sorry state of dynamic libraries on Linux by Thiago Macieira and Everything You Ever Wanted to Know about DLLs by James McNellis. A recent talk given at CppCon by Ofek Shilon gives insights on Linkers, Loaders and Shared Libraries.

Some interesting design edition to simplify C++ API and readability are given by Björn Fahller in Typical C++, But Why?

Some notes on undefined behavior are given by Fedor Pikus in this talk.

One should be aware of the out-of-bound access issues introduced by buggy algorithms. While tools exist to try to circumvent these issues, it is possible to use C++ to, in a lightweight way, provide performance and increased safety under the form of illegal operation = crash. Thus limiting silent corruption which to be fair, may be one of the most horrendous nightmare to debug. Tristan Brindle presents a library (unfortunately limited to C++20 and up) that implements his ideas on using indices instead of iterator as a way to force dereference in full context regarding the state and size of a container. This is clearly not something you want in tight kernels, though we could see many use cases for more common code, dealing notably with initialization, communications, restart and general data block management across operators, host and device.

The C and C++ languages

If one seeks insights into the original intended use of the C++ standard library, Bob Steagall presents Back to Basics: Classic STL. Obviously, as Alexander Stepanov heavily contributed to what was called the Software Technology Lab (STL) or Standard Template Library (STL) or Stepanov And Lee (STL) library, one should also take a look at the documents he produced. Some of which are and presented in the Algorithm and generic programming subsection.

Undefined Behavior (UB) is a plague unfortunately found if many if not all codes. In C, C++ and Fortran it is particularly pervasive. Sometimes, UB is used to promote an optimization, but it is unfortunately often involuntary. A nice read is given by Chris Lattner on the LLVM blog: What Every C Programmer Should Know About Undefined Behavior. See Debugging tooling for tools to protect against some cases of UB.

Floating point computation

IEEE 754 Floating point

The representation (or the approximation) of the real numbers comes in many flavors. One of which is the floating point IEEE754 standard. Representing reals in such a way requires obvious concessions and has non obvious side effects. David Goldberg’s What Every Computer Scientist Should Know About Floating-Point Arithmetic is a recommended introduction.

A note on the concept of Unit in the Last Place (ULP) is given by Jean-Michel Muller in On the definition of ulp(x).

When porting code from CPU to GPU, one should not expect bit perfect results on bot architecture. Expecting bit perfect result reproduction is an illusion as one can not control parallel reduction order (except at some cost), nor can he control the usage of Fused Multiply Add (FMA), unrolling, floating point operation reordering, hardware implemented transcended function etc. (except, again, at the cost of performance). Anyway, one should expect discrepancies some of which are described in Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs by Nathan Whitehead and Alex Fit-Florea.

Numerical computation oriented algorithm

While by far not a book on coding best practices (too bad they don’t lead by example), Numerical Recipes 3rd Edition: The Art of Scientific Computing, ISBN-10: 0521884071 can present the basics of many computations technics, tricks and concepts.

In addition to what Knuth proposes, you can take a look at the numerical algorithm shown in Introduction to Algorithms, 3rd Edition, ISBN-10: 0262033844.

Numerical analysis

Accuracy and Stability of Numerical Algorithms 2nd Edition, Higham, Society for Industrial and Applied Mathematics, ISBN-10: 0898715210.

Concurrency for parallel software

Concurrency is the ability of an algorithm to produce the expected result when part of it are executed with relaxed ordering, that is, not necessarily sequentially. As an example \(a \times b + c \times d\) contains two multiplications which could be executed in any order: left then right or right then left. The two resulting intermediate value would not change. Without concurrency you will not scale on HPC clusters as this concurrency is what allows for parallelism, that is, distribution of the work on multiple processing units.

Atomic operations

Atomic operation are tricky to say the least. Herb Sutter’s atomic Weapons part 1, atomic Weapons part 2, Frank Birbacher’s Atomic’s memory orders, what for? and Fedor Pikus’s Concurrency in C++: A Programmer’s Overview part 2 present some of the complexity. Two other recommended documents are the GCC documentation section and the cppreference section on atomic memory orders.

Although Fedor Pikus’s Concurrency in C++: A Programmer’s Overview part 1 and Concurrency in C++: A Programmer’s Overview part 2 are directed toward a C++ audience, the concepts involved are key to good parallel software design. In part 2, it notably presents atomic operations and some of their pitfalls.

Parallel programming using OpenMP

Official documentation

OpenMP specification v4.5 and OpenMP specification v5.1. Maybe more importantly, the OpenMP specification v5.0 Examples.

LLVM shares a web pages tracking the implementation state of standard OpenMP features.

Courses

The best publicly available courses are probably the ones given by some of the OpenMP standardization committee members themselves. EUROfusion dispensed such courses which are now available on Youtube (Part 1: Introduction, part 2: Tasking, part 3: NUMA and SIMD, part 4: Offloading and part 5: Advanced offloading). The courses’ resources are available here.

Compiler infrastructure

Compilers are complex machinery. We can try providing some insights on the service in renders us, mostly in the form of optimizations.

OpenMP optimization

LLVM provides a C and C++ compiler called Clang. It ships with an OpenMP implementation and some work is being put into adding optimization passes that are OpenMP aware. This has gained more traction with the inclusion of accelerator as target for OpenMP.

Johannes Doerfert provides some (old but valid) insights on the internals of OpenMP in LLVM in Compiler Optimizations for OpenMP Accelerator Offloading. The talk is available in a longer and more thorough version. A more recent talk by Eric Wright informs us of the role omp simd could play in LLVM’s OpenMP target GPU support: GPU Warp-Level Parallelism in LLVM/OpenMP. If you wish to dig more into subject, we recommend the following papers and presentations:

Some of the LLVM OpenMP backend developer present ideas on OpenMP Parallelism-Aware Optimizations.

Also, GNU OpenMP implementation documentation and the source code of the GNU or the LLVM implementation can be of interest.

Generating (pseudo) random numbers

Generating random numbers is often misunderstood, based on faulty design and thus, often misused. That is, most codes are just wrong. Walter E. Brown gives some insight on the history of Pseudo Random Number Generator (PRNG) in the C and C++ languages, and then gives some notes on code patterns to avoid.

CPU programming

A classical introduction read on CPU micro architecture is given by Jason Robert Carey Patterson in Modern Microprocessors A 90-Minute Guide!.

Memory is core to the Von Neumann computer architecture. Most HPC software are memory bound, that is, their performance is primarily dictated by the speed at which the data in main memory (RAM) can be accessed. One should know some basics on how it behaves, what are the software pitfalls. Ulrich Drepper presents RAM and memory controller hardware design and maybe most importantly to the reader, CPU caches details in What every programmer should know about memory.

Understanding the Zen 4 Genoa CPU

User clamchowder on chipsandcheese.com gives some insights on the inner working of the Zen4 architecture. In part 1, the predictors, register renaming, out of order execution and AVX512 capabilities of the microarchitecture are evoked. In part 2, the cache hierarchy, core to core latency, store and load latency and memory throughput are mentioned.

GPU programming

Basics

In addition to what we present in the porting guide we proposes a lot of external document.

CppCon 2016: “Bringing Clang and C++ to GPUs: An Open-Source, CUDA-Compatible GPU C++ Compiler”, PRACE Multi GPU courses, PRACE HIP courses, GPU Hackathon.

Understanding the AMD MI250X GPU

Some notes on AMD’s Instinct MI200 Architecture.

Carl Pearson of Sandia National Laboratories (SNL) exposed some ways to exchange data between CPU and MI250X and presented it in Interconnect Bandwidth Heterogeneity on AMD MI250x and Infinity Fabric

Folks at Juelich Computing Center (JSC) did some benchmarking on the communication speeds between GCDs in First Benchmarks with AMD Instinct MI250 GPUs at JSC.

The AMD GPGPU software stack is called ROCm and rely on the ROCt and ROCk software. ROCt serves as the interface for talking to the kernel’s AMD GPU driver ROCk. The ROCt interface implements the Heterogeneous System Architecture (HSA) interface. So the AMD HIP runtime talks to the OS kernel through an HSA interface and the kernel talks to the GPU.

Notions of debugging

“At first, the machines were really simple but not powerful. Then they got really powerful, but really mean.”

How you debug a code is quite situational. Having thorough logs or a useful coredump is rare.

Note

This is an opinionated remark but, GDB is most useful when used on coredumps. It becomes quite tedious (impractical) on large software with many software threads or multiple processes.

One should always try to add the following Clang/GCC C/C++ flag -g or its equivalent in other languages or compilers. This flags does not slow down your program and does not impact the generated machine code. It will largely increase your executable size but the debug information will not reside in the program’s memory space.

Somme notes on the ELF file format are given by Greg Law in Linux Debuginfo Formats. ELF is used to represent your binaries on a machine like Adastra.

Debugging methodology

Over-quoted Kernighan said that debugging is twice as hard as writing a program in the first place. And obviously, that is somewhat true and makes for the first point: do not try to be too smart when implementing, though, not less smart than you need to be. For instance, the Linux Kernel is very simple locally, it is indeed simple C code, but as a whole, it becomes a very complex machinery. Try to figure out what code complexity you really need to implement and try to use tools to help you maintain good code quality.

If you have good code, most bugs are trivial. Though not necessarily easy to find, they are easy to fix and often can be found using the second point: the ancestral printf and comment technic. It can often allow you to do a binary search in your code. As this can achieve logarithmic complexity in bug research, it is quite good, but not as good as knowing your code and developing an intuition. Now, depending on the low-levelness of your work, printf could turn into gdb, strace, etc..

The third point is intuition which you develop for your code, but you also by knowing more about the environment, the context it executes in. Intuition is probably the best method but it is not systematic compared to the other two.

Assuming the three other points above did not succeeded, the bug left are hard bugs and hard bugs cannot be solved unless you know more about your platform or program (problem) which means you need either to learn more on the subject (develop intuition) and take time do dissect the bug (good luck with that), or you can get help by using tools or asking someone more knowledgeable on the subject.

Debugging tooling

On CPU

On CPUs you can use preventive bug checking with tools called sanitizers. Clang offers the -fsanitize= flag with can be used to enable (some cases of) undefined behavior detection, memory leak and out of bound accesses detection using address, thread related race condition detection and uninitialized memory accesses detection. Cray provides the valgrind4HPC tool and documents its use in this document.

Do not hesitate to read the Clang address and memory sanitizer documentation.

You may also use tools such as or Dr. Memory or Valgrind for a more throughout memory leak detection. Note that these tools’ functionalities is not limited to memory leak detection. As an example Valgrind provides tools to observe cache misses. Cray provides the valgrind4HPC tool and documents its use in this document.

Note

Tools such as Valgrind basically emulate the CPU, the program under test runs in a sandbox. The advantage being that the host (say, Valgrind) is omniscient and can thus, catch calls to say, malloc and deduce if the memory was accessed out of bound or even, not released. This emulation method is expensive and the program may experience slowdowns ranging from x5 to x40. For this reason, we strongly recommend running your everyday tests under the more lightweight sanitizer such as the ones presented above. Much less often, you can obviously run a Dr. Memory or Valgrind pass.

An introduction guide on using GDB4HPC is given in debugging a hung application or crashed application using GDB4HPC. Cray also provides tools such as Cray Stack Trace Analysis Tool (STAT) and Cray Abnormal Termination Processing (ATP).

On GPU

On GPUs, if you write in HIP, CUDA, OpenMP targets, Kokkos, Sycl etc. you should have access to the printf function inside kernels.

A note on benchmarks

Be careful with benchmarks, they only show how something behaves out of its context (encompassing the OS, hardware, network/IO, system calls, etc.).

Now, assuming any software, if the design says that something is the best solution for what you are trying to do, by all means use it, do not rely on some random guy (or guru) that tells you it’s slower and that you should not use it. By all means, use it, profile, and if its fast enough, their you go.

By “something is the best solution for what you are trying to do”, understand that you should use the proper tool for the job. Don’t use a sparse solver for a dense system, don’t use inheritance if you don’t need a customization point.

The Bash Unix shell

The basics to using the Bash shell programming in Linux.

On the issues encountered in software engineering

A celebrated classic is Fred Brooks’ The Mythical Man-Month, ISBN-10: 0201835959. It covers many aspect of complex software development. It is arranged in multiple real stories to motivate the points. While written a some 50 years ago, it will nonetheless consolidate your foundational understanding of software and how it is conceived.

Network programming, distributed and shared memory abstractions

Low level network communication concepts

Details on the challenges and tradeoffs made when designing a library such as OpenFabric Interfaces. This is a good all rounder document to anyone seeking to better understand how the lower level layer of an MPI implementation works.

The MPI standard

One should always try to stick close to the standard and avoid relying on implementation specific behaviors. You can find the standard document for MPI 4.0 at this URL: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf (retro compatible with previous standards).

A lighter explanation of the API can be found here: https://www.mpich.org/static/docs/v4.0/.

On Adastra you should use Cray MPICH.

Choosing a license

We are aware that researcher may not always be interested or feel concerned about the licensing of their software. Only scratching the surface of the legal aspects of software and its use for computing, we would strongly advise the developers of a code to carefully choose a license for their code.

A high level breakdown of the most common licenses is given on choosealicense.com.

In France, there is also the CeCCIL license which originated from CEA, CNRS and INRIA.

Other

OLCF training archive.

Quantic road map 2022 commissioned by the French government.