Using Python at ITA

Python is a widely-used programming language for scientific data analysis, visualisation, and general programming. This article details how to use Python at ITA, how to install additional packages, and where to get more information.

Image may contain: Font, Brand.
  1. Using Python at ITA
  2. Running Jupyter remotely
  3. Installing Python in your laptop
  4. Learning more Python
  5. Troubleshooting

1. Using Python at ITA

This section has been updated, the module system is no longer recommended for python. Python is installed in nearly all machines at the Institute, but the default (system) version lacks the most common scientific packages. Instead, it is recommended that you use a conda/mamba environment. At the Institute linux machines there is a mamba environment that is recommended for most users. See below for how to use it.

1.1 First time setup of mamba

To start using the mamba environment, you need to configure your shell. Enter the following in a terminal

/astro/local/mamba/condabin/mamba init

And then close the terminal and reconnect to the machine. This will add a few lines into your shell config file (e.g. .bashrc for bash and .cshrc for tcsh), so it knows where to find python. By default, this will also change your prompt: it will appear prepended with (base). It is recommended that you do NOT use the base environment nor load it by default (it has only basic packages). You can deactivate its default loading with:

conda config --set auto_activate_base false

which will update your ~/.condarc file with the option. You are now ready to use the mamba system!

1.2 Choosing a mamba environment

The mamba system has different environments, which are self-contained python installation that may contain different sets of packages. At the moment there are two supported environments:

  • py312: based on python 3.12, contains the most recent versions of a wide variety of scientific packages. Recommended for most people. Pure python code than in python 3.10. This version contains packages for GPU computing, and GPU-compatible versions of ML/AI packages such as PyTorch and TensorFlow.
  • py311: based on python 3.11, older version kept for compatibility purposes. No longer updated.
  • py310: based on python 3.10, older version kept for compatibility purposes. No longer updated.

To activate a mamba environment you do:

mamba activate env_name

Where env_name is the name of the environment. Your prompt will be prepended with the environment name, e.g. (py311). To load this by default you can include the mamba activate line in your .bashrc or .cshrc file.

If you want to check if a package you want is installed, you can (after activating the environment) use

mamba list mypackage

If the package is not installed and you think it is general and of use to several people at the Institute, please send your request for installation to Tiago Pereira. Note that packages not available through conda/mamba or that conflict with other packages may not be installed. Otherwise, your best option is to create your own environment (see below).

1.3 Installing your own mamba environment

For advanced users or those with specific needs, you can install your own environment. The simplest way to do this is to use the mamba system. This needs to be installed in a directory where you have write access. By default, new mamba environments get placed in your home directory, under ~/.conda/envs. This is not ideal, and may lead to problems with your quota. It is best to put environments in a data directory you have access outside your home directory. You should configure this before creating a new environment:

conda config --add envs_dirs /my/data/directory/envs

Then you can create a new environment with mamba, specifying any packages you want:

mamba create -n my_env -c conda-forge python=3.9 numpy=1.23

This uses the conda-forge channel, which is recommended, but you can use others too. You can then activate my_env and start using it. NOTE: after the above command you will see a few mamba warnings and errors (mostly about not being able to write urls.txt and cache.lock). These errors are normal and should be ignored.

If you want to install more packages, just use mamba from your environment:

mamba install my_package

Some packages are not available through mamba, so you may need to install them via pip:

pip install my_package

Both mamba and pip can also update or uninstall packages. In the last resort, you may need to install packages from source. Here you should refer to the installation instructions of the package, but typically what happens is the following:

tar zxvf mypackage.tar.gz  # or similar
cd mypackage
python setup.py install

Note that you cannot install packages with mamba in the global environments (only your own environments), but you can add packages to use with the global environmentsby adding the flag --user with pip or the setup.py install.

2. Running Jupyter remotely

The Jupyter notebook and Jupyter lab are browser-based environments where you can run Python notebooks. One of their advantages is that you can run Python on a more powerful remote computer, but work responsively from the browser at your workstation or laptop.

Currently, the way to run Jupyter  remotely is via an ssh tunnel. Suppose you want to run something in a computer called "beehive". First you ssh into beehive and then start Jupyter with the --no-browser option:

ssh beehive
jupyter lab --no-browser

When you start Jupyter, you'll see in the output something like http://localhost:8888/?token=5089680adec1040. Copy that URL for later pasting. The next step is to open an ssh tunnel to beehive, so that beehive's port 8888 (running Jupyter) maps into a port in your own computer, let's say port 9999:

ssh -L 9999:localhost:8888 beehive

This will open an ssh session that you can close when no longer using Jupyter. Now you are set! Open up your browser and paste the URL you previously copied, changing port 8888 to 9999. You should see the Jupyter session from beehive in your browser.

Note that for the above to work, you need to be able to ssh into beehive. If you are outside UiO, this will not be possible by default (you need to update your ssh config to be able to configure connection hops).

3. Installing Python in your laptop

If you want to install the scientific Python packages in your laptop, the recommended approach is to download and install a miniforge distribution. This makes use of the minimal miniconda with the conda-forge channel (more up-to-date scientific packages). The best option is to use Mambaforge, which uses mamba as default instead of conda (mamba is much faster).

4. Learning more Python

Want to learn more Python for scientific data analysis? Here are some good resources:

5. Troubleshooting

Here are some common problems you may encounter:

Tags: python By Tiago M. D. Pereira
Published July 17, 2018 4:49 PM - Last modified May 29, 2024 2:02 PM