Using Python at ITA
Python is a widely-used programming language for scientific data analysis, visualisation, and general programming. This article details how to use Python at ITA, how to install additional packages, and where to get more information.
- Using Python at ITA
- Installing additional Python packages
- Running Jupyter remotely
- Installing Python in your laptop
- Learning more Python
Python is available in all Linux and Mac workstations at the Institute. However, the default (system) version is not recommended for science because it is older and lacks many scientific libraries. The recommended way to use Python in Linux is to load its module:
module load python
This will load an up-to-date Python 3.7 distribution based on Anaconda. To load this version automatically, you can add the line above to your .bash_profile or .tcshrc files.
On a Mac, the module system is not available, so you will need to change your $PATH environment variable to use the Anaconda python:
export PATH=/mn/stornext/d7/mac/anaconda3/bin:$PATH # For bash setenv PATH /mn/stornext/d7/mac/anaconda3/bin:$PATH # For tcsh
All the commands above will load Python version 3.7. It is strongly recommended that you use Python 3 over 2.7. Nearly all packages support Python 3, many packages have stopped adding new features to their 2.7 compatible versions, and most scientific software will drop support for version 2.7 before 2020 (see Python 3 statement). If someone (e.g. your supervisor) tells you to use Python 2.7, ask again and check if you really need it. For compatibility reasons, we still have a Python 2.7 distribution, but its packages are not updated so often and it's not recommended unless you really know what you're doing. To use it, load the module python/2.7:
module load python/2.7
And on a Mac:
export PATH=/mn/stornext/d7/mac/anaconda2/bin:$PATH # For bash
The Python distributions at ITA have many python packages, but it is possible that you need additional modules that are not installed. There are several ways to install Python packages, and this section lists the different approaches by priority:
2.1 Packages available in conda
If a package is available in the conda package manager, this is the best option to install. You can check if your package is available by typing:
conda search mypackage
(Make sure you load the python module before.) If you see any hits, please send your request for installation to Tiago Pereira (be sure to note if you need a specific version). This is necessary because users don't have permission to write in the anaconda directory (however, see other points for local installs). You can also check if the package is already installed, or find the installed version with conda:
conda list mypackage
Please note that even if a package is on conda, it may not be possible to install (e.g. conflict with other installed packages). In these cases, as well as for packages not available though conda, follow one of the points below.
2.2 Installing packages with pip
pip is another package manager for Python. It allows the installation of many packages from PyPI. For users, it has an advantage over the conda installs: you can install modules in your on directory, without needing root permissions. The downside is that pip is more limited in support for non-Python packages, and can lead to conflicts if the same package is eventually installed through conda. To install a package with pip, for your user only, enter the following:
pip install mypackage --user
This will install the package in your home directory, under ~/.local/lib/python3.7/site-packages.
You can also uninstall with pip:
pip uninstall mypackage
If you want a package that is of general interest to others, please consider letting Tiago Pereira know so it can be installed globally.
2.3 Installing packages from source
If your package is not available through conda or pip, the last resort is to install it manually from source. Here you should refer to the installation instructions of the package, but typically what happens is the following:
tar zxvf mypackage.tar.gz # or similar cd mypackage python setup.py install
The last step by default will try to install in the global directory, to which you have no write permissions. Fortunately, you can add the option --user to install it locally in your home directory:
python setup.py mypackage --user
Note that it is crucial to have loaded the python module before doing this. Also, the manual install is the least flexible of all the options and normally does not have an automatic option for uninstalling. Some packages may require compilation of other, non-python, libraries.
2.4 Your own conda environment
Another option if you want to stay with the conda environment but keep different versions of installed packages or different channels is to have your own conda environment. This should only be of use for experienced users, and doesn't help if the packages you want are not in conda. You can use the system conda to create a new environment:
conda create --name myenv conda activate myenv
This will typically install an environment into ~/.conda/envs/myenv, but can be changed in your ~/.condarc file. Your new environment will have only a few basic packages, so after activating it you must install all the packages you need:
conda install package1 package2 ...
This might take up a significant amount of space in your home directory!
The Jupyter notebook and Jupyter lab are browser-based environments where you can run Python notebooks. One of their advantages is that you can run Python on a more powerful remote computer, but work responsively from the browser at your workstation or laptop.
Currently, the way to run Jupyter remotely is via an ssh tunnel. Suppose you want to run something in a computer called "beehive". First you ssh into beehive and then start Jupyter with the --no-browser option:
ssh beehive jupyter lab --no-browser
When you start Jupyter, you'll see in the output something like http://localhost:8888/?token=5089680adec1040. Copy that URL for later pasting. The next step is to open an ssh tunnel to beehive, so that beehive's port 8888 (running Jupyter) maps into a port in your own computer, let's say port 9999:
ssh -L 9999:localhost:8888 beehive
This will open an ssh session that you can close when no longer using Jupyter. Now you are set! Open up your browser and paste the URL you previously copied, changing port 8888 to 9999. You should see the Jupyter session from beehive in your browser.
Note that for the above to work, you need to be able to ssh into beehive. If you are outside UiO, this will not be possible by default (you need to update your ssh config to be able to configure connection hops).
If you want to install the scientific Python packages in your laptop, the recommended approach is to download and install the Anaconda distribution. If you are an experienced user and want to save disk space, you can instead install the Miniconda distribution as it contains a minimum amount the packages (later you can install only the ones you need via conda).
Want to learn more Python for scientific data analysis? Here are some good resources: