Skip to content

AI Cloud structure

The AI Cloud is a cluster consisting of a number of nodes (servers).

You are always welcome to contact the CLAAUDIA team for guidance on how to best use the described platforms.

AI Cloud

The AI Cloud is the second generation of CLAAUDIA's AI Cloud service which has gradually been put into service since 2021.
The AI Cloud consists of a front-end node (ai-fe02.srv.aau.dk) and a number of compute nodes. The AI Cloud is a heterogeneous platform with several different types of hardware available in the compute nodes.

The front-end node is used for logging into the platform, accessing your files, and starting jobs on the compute nodes. The front-end node is a relatively small server which is not meant for performing heavy computations; only light-weight operations such as transferring files to and from AI Cloud and defining and launching job scripts.

The details of defining and running jobs are described in the introduction.

AI Cloud pilot platform

The AI Cloud pilot platform was the first generation of the AI Cloud and was in service 2019-2022. This platform was available through the front-end node ai-pilot.srv.aau.dk (also known as nv-ai-fe01.srv.aau.dk), but no longer exists.
If you had data in the AI Cloud pilot platform, this is still available through the current front-end node instead.

Operating system, file storage, and application framework

The AI Cloud is based on Ubuntu Linux as its operating system. In practice, working in the AI Cloud primarily takes place via a command-line interface.

Two major building blocks are essential to working with the AI Cloud: a resource management / queuing system called Slurm and a container system called Singularity/Apptainer.

Info

The container system formerly known as Singularity has changed name to Apptainer. So far, the AI Cloud and AI Cloud pilot platform are still using a version by the name Singularity. It is likely that this will change to Apptainer in the future. So far, we refer to the product as Singularity/Apptainer or simply Singularity in the documentation. If or when we eventully switch to a version by the Apptainer name, the documentation will be updated accordingly.

Slurm

Slurm is a queueing system that manages resource sharing in the AI Cloud. Slurm makes sure that all users get a fair share of the resources and get served in turn. Computational work in the AI Cloud can only be carried out through Slurm. This means you can only run your jobs on the compute nodes by submitting them to the Slurm queueing system. It is also through Slurm that you request the amount of ressources your job requires, such as amount of RAM, number of CPUs (logical CPUs with hyperthreading = 2 × physical CPUs = 2 × cores), number of GPUs etc.
See how to get started with Slurm in the introduction.

Singularity/Apptainer

Singularity is a container framework which serves to provide you with the necessary software environment to run your computational workloads. Different researchers may have widely different software stacks or perhaps versions of the same software stack that you need for your work. In order to provide maximum flexibility to you as users and to minimise potential compatibility problems between different software installed on the compute nodes, each user's software environment(s) is defined and provisioned as Singularity containers. You can both download pre-defined container images or configure or modify them yourself according to your needs.
See details on container images from NGC in the introduction.

File storage

Files in "user directories" and "project directories" are stored on a central network file system, and accessible to all nodes. When you launch a job, access to the network file system is carried over to the compute node. This means that there is no need to synchronise files between nodes. When you store or edit a file in your user directory on the front-end node, the compute nodes can see the same file and its contents.

Storage quota expansions

When users log in to AI Cloud for the first time, a user directory is created for them. These directories are allocated 1 TB of storage by default. This should be plenty for most users, but should you need additional space, it is possible to apply for storage quota expansions for a limited time using our Storage quota expansions form.

Info

When you log in to the platform, you can see your storage usage of the user directory at the very top line:

Current quota usage: 181GiB / 1.0TiB
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-169-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

  System information as of Fri Mar 15 11:09:21 CET 2024

Group project directories

For projects where users need to collaborate and share files with other users, it is possible to create a group folder inside the directory home/project. Please consult the page Group Project to learn more about how to use this directory.