Oslo Bioinformatics Workshop Week 2022

December 7th-13th 2022 will see the very first Oslo Bioinformatics Workshop Week at the University of Oslo, Norway. This event is organised by the Student Committee of the Centre for Bioinformatics at UiO, in collaboration with the ISCB Regional Student group in Norway. 

These workshops are open to the scientific community in Oslo and the surrounding area.

 

See the schedule below, and follow the links for more information.

 

Please note that all workshops will be in-person only, we will not offer participation online.

 

Any questions or registration modifications should be addressed to oslo-bioinfo-workshops@ifi.uio.no.

 

Registration is now closed!

 

Agenda

 

Day Morning 9:00-12:00 Afternoon 13:00-16:00
Wed 07.12.22

High-performance computing in bioinformatics with Numpy/bioNumpy

Bioinformatics beginner's course - an introduction to whole genome sequencing

Introgression detection

 

Species occurrence data publication to GBIF

Metagenomes meta-analysis: bioinformatic ecosystem from idea to data

 

Thu 08.12.22

Clustering and plotting of single-cell expression data

Responsible development in Machine learning

Workflows for multi-omic data for genome regulatory annotation

 

 

A hands-on introduction to NeLS and usegalaxy.no

 

Fri 09.12.22

An introduction to Snakemake and Snakemake workflows

Multi-omics data integration analysis

 

Container technologies and their use in GWAS analysis

Hands-on introduction to uniFAIR: a systematic and scalable approach to research data wrangling in Python

 

Mon 12.12.22

microRNA profiling and sequence analysis

Microscopy Image Processing with ImageJ

Create a data management plan with DSW

Introduction to Git and Development Cycle

 

Tue 13.12.22

Genome assembly, curation and validation

Introduction to gene expression regulation by transcription factors and its computational analysis

 

Statistical principles in machine learning for small biomedical data

 

 

 


 

microRNA profiling and sequence analysis

Date: Monday 12 December 2022 9:00-12:00 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

This workshop will teach the basics of NGS microRNA analysis from raw data processing (and retrieval from SRA), to microRNA profiling & DE-analysis and creation of publication-grade plots (R).

We will also present the Galaxy-based tool MirMiner for de novo prediction of microRNAs.

 

Learning outcomes

  • Get to know important steps and programs for microRNA analyses.
  • Be able to fully reproduce figures from papers including raw-data recovery, processing and figure production (https://pubmed.ncbi.nlm.nih.gov/26976605/ Figures 2,B,C,D)

 

Target audience

PhD students and above

Pre-requisites

 

  • Command-line and R experience and working R and R studio installations are required. Conda experience is a benefit.
  • We will send a list of programs we will use a week before the course.
  • For the Galaxy part of the tutorial, we recommend this earlier course of the Bioinformatics week: A hands-on introduction to NeLS and usegalaxy.no

 

Equipment to bring

laptops

Instructor(s):
Bastian Fromm (main)

Go back


An introduction to Snakemake and Snakemake workflows

Date: Friday 09 December 2022 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

In this workshop, I am planning to give a short introduction to the Snakemake workflow management system. This entails installation, running the simple scripts via Snakemake, using some advanced features of Snakemake, and how users can use Snakemake in their daily tasks.

 

Learning outcomes

How to install conda/snakemake, how to use snakemake for designing bioinformatics pipelines.

 

Target audience

Master students and PhD students who wants to create bioinformatics pipelines

Pre-requisites

 

  • Basic programming skills, basic Linux commands

 

Equipment to bring

a laptop

Instructor(s):
Sinan U. Umu (main)

Go back


Genome assembly, curation and validation

Date: Tuesday 13 December 2022 9:00-12:00 13:00-16:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)

 

 

Description

The advent of such projects as Earth Biogenome Project (EBP) and European Reference Genome Atlas (ERGA), and our own Earth Biogenome Project Norway (EBP-Nor) is partly due to the fact that creating high-quality chromosome-level genome assemblies for almost any species is currently possible. However, sequencing technologies are still not at the point where chromosomes are reconstructed from end-to-end without gaps, so the use of Hi-C to scaffold contigs made from long reads is essential. Validation and statistics are important parts of the assembly process. For instance, the input reads can be used to assess the heterozygosity of the sample, in addition to giving genome size and ploidy estimations. In different stages of the assembly process, investigating the presence of expected genes and comparisons of kmer profiles of the assembly to the reads, can be used to assess the quality of the assembly. In addition to the target species, contaminants/symbionts/cobionts can sometimes occur, and sequences from these need to be identified and separated. After scaffolding and removing possible contaminants, the genome assembly needs to be manually curated, a process that can fix assembly errors and improve the assembly overall. In this workshop we will touch upon these matters and discuss the tools often used to do these analyses.

 

Learning outcomes

After attending the workshop learners should:
  • know about most-used approaches for genome assembly
  • assess information inherit in sequencing reads
  • be able to validate genome assemblies
  • know about manual curation of assemblies

 

Target audience

Anyone interested in learning more about genome assemblies and how to make them

Pre-requisites

 

  • should have some familiarity with the command line (otherwise see https://swcarpentry.github.io/shell-novice/)
  • have a laptop to work on

 

Equipment to bring

Laptop with a working terminal

Instructor(s):
Ole Kristian Tørresen (main)
Benedicte Garmann-Johnsen,

Go back


High-performance computing in bioinformatics with Numpy/bioNumpy

Date: Wednesday 07 December 2022 9:00-12:00 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

Python is becoming an increasingly popular choice for computational biology and bioinformatics analysis, as it is easy to learn, is flexible and enables rapid data analysis and tool development. However, methods and workflows implemented in Python often don’t scale computationally to large data sets. In this workshop, we show through lectures, live-coding and exercises how the Python packages NumPy and BioNumPy can be used to write efficient Python programs that can handle big datasets and problems that we currently encounter as bioinformaticians. The first part of the workshop will cover the basics of Numpy and BioNumpy, while the second part is devoted to working on real world biological applications using data from sources such as fastq, fasta, bed, vcf, bdg, gtf, bam and sam files.

 

Learning outcomes

After attending this workshop, learners are able to write high perforamance python code using Numpy, and use BioNumpy to apply that to biological data

 

Target audience

All stages, fields and interests.

Pre-requisites

 

  • Familiarity with python is assumed

 

Equipment to bring

Laptop with python installed

Instructor(s):
Knut Dagestad Rand (main)
Ivar Gryten

Go back


Bioinformatics beginner's course - an introduction to whole genome sequencing

Date: Wednesday 07 December 2022 9:00-12:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

After giving a short overview of Next Generation Sequencing, paired-end reads and the difficulties of whole genome assembly, we will address quality control of the raw data (FastQC), common file types and adapter/quality trimming. Differences between de novo assembly and mapping to an annotated reference genome will be explained, and we will present different software solutions and tools.

 

Learning outcomes

  • Perform quality checks of NGS raw data with FastQC
  • Adapter and quality trimming
  • Contig assembly and mapping to reference genome with SPAdes/Bowtie2

 

Target audience

Students, PhD student, Postdocs, Medical doctors, hospital employees

Pre-requisites

 

  • No prior knowledge required, participants need to install FastQC and Geneious on their laptop, we will provide educational licenses for Geneious.

 

Equipment to bring

Ordinary laptop (with administrator access, i.e. no Sykehuspartner PC)

Instructor(s):
Timo Lutter (main)

Go back


Responsible development in Machine learning

Date: Thursday 08 December 2022 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

Have you ever heard the phrase “If you can’t reproduce your results, you can’t trust them”? Bioinformatics software, as any software product, is vulnerable to feature major bugs that invalidate scientific findings and great tools can be unused if others fail to run them on their computers. The good news is that software engineering solutions have developed over the years so that we can implement good practices without in-depth knowledge in computer science. This is one of the two focus points of this workshop. The second focus is machine learning (ML). There is an increased interest towards machine learning within the bioinformatics society, driven by spectacular results. Trust can be especially important with ML software, as the models will always predict something in which a keen eye may find something interesting, even if the initial code has a major bug and the predictions are nonsense. As most popular ML software has been implemented by others, it is common to import large software ecosystems with a fair amount of dependencies. During this workshop we will code a small ML project to predict disease outcomes from publicly available patient data. While working on this mini project, we will give special emphasis on dependency management and software quality with focus on reproducibility, maintainability, readability, and extensibility of your code.

 

Learning outcomes

Participants will learn how to set up a reproducible software environment, work in a reproducible Jupyter notebook following template for best practices, and learn how to refactor scripts to testable code.

 

Target audience

People who are interested in improving their coding is welcome. We hope to have a nice discussion involving the insights of senior professionals as well as early career scientists.

Pre-requisites

 

  • Participants are expected to be familiar with the command line, github and python.

 

Equipment to bring

laptop running MacOS or Linux

Instructor(s):
Katalin Ferenc (main)

Go back


Create a data management plan with DSW

Date: Monday 12 December 2022 9:00-12:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)

 

 

Description

This workshop aims at providing a hands-on session on creating a data management plan (DMP) using the Data Stewardship Wizard (DSW), a dynamic web forms system provided by ELIXIR Europe. The Norwegian node of ELIXIR is curating a national instance of DSW (https://elixir-no.ds-wizard.org) and provides a DMP form customised for life scientists in Norway. In this workshop, the primary instructor will initially walk the audience through the features of DSW while explaining the information to be captured in a DMP. As part of this demonstration, the instructor will show how to create a DMP, starting either from scratch or from a set of recommendations based on specific scientific domains including, among others, sequencing, light microscopy and high-throughput screening. A demonstration of how to work collaboratively on DMPs and how to export them to a document compliant with national and international funding bodies' requirements will also be given. In the second part of the workshop, the participants will work, individually or as groups, on a DMP for projects they are involved or for projects of interest. The two instructors will give advice and bring up interesting discussion points based on feedback from the participants. Depending on the audience, a discussion on how to modify a questionnaire or on other, more technical aspects might occur as part of the workshop.

 

Learning outcomes

After attending the workshop, the learners will be able to:
  • Understand the relevant information to be captured in a DMP
  • Start a new DMP on DSW, from scratch or from a set of pre-filled recommendations
  • Share a DMP and work collaboratively on it
  • Assess good practices in research data management
  • Export a DMP according to funders' templates including the Science Europe template (adopted by the Research Council of Norway) and the one from Horizon Europe

 

Target audience

This workshop targets researchers at any career stage with responsibility for a certain project or the data generated within a project. Data specialists or technical personnel in the life sciences that want to learn more about DSW are also welcome.

Pre-requisites

 

  • Account on the Norwegian instance of DSW (https://elixir-no.ds-wizard.org)

 

Equipment to bring

laptop. Accounts on https://elixir-no.ds-wizard.org/

Instructor(s):
Federico Bianchini (main)
Nazeefa Fatima – Centre for Bioinformatics, UiO / ELIXIR Norway

Go back


Workflows for multi-omic data for genome regulatory annotation

Date: Thursday 08 December 2022 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

This workshop will demonstrate workflows for turning raw sequence data from RNA-Seq, ATAC-Seq, and ChIP-Seq assays into descriptive results, and how to combine these multi-omic results into annotations of chromatin states and regulatory regions in the genome.

 

Learning outcomes

Learners will have knowledge on how to run next-flow pipelines from nf-core to process and quality control data from multi-omic sequencing assays. Learners will have knowledge on how to utilize results of multi-omic sequencing assays together using chromatin segmentation methods to annotate regulatory regions of the genome.

 

Target audience

Researches who wish to use workflows for processing their sequencing data from ATAC-Seq, ChIP-seq, or RNA-Seq assays to results. Researchers interested in annotating genome chromatin state, or regulatory regions.

Pre-requisites

 

  • Knowledge of running scripts in a command line environment. Requires a computer with access to the terminal and internet.

 

Equipment to bring

Laptop with internet access

Instructor(s):
Gareth Gillard (main)

Go back


Multi-omics data integration analysis

Date: Friday 09 December 2022 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

In this workshop I will discuss challenges of working with high-dimensional biological data and introduce a machine-learning view of multi-omics data integration analysis. In particular, I will focus on what is conceptually meant by data integration and how it should be evaluated. Further, I will give an overview of a few main analytical techniques for multi-omics pre-processing (feature selection) and integration such as graph-based, matrix factorization and Bayesian methods. Finally, I will demonstrate a few practical examples for supervised and unsupervised integration of single cell multi-omics data.

 

Learning outcomes

After attending the workshop, participants should know main principles of biological multi-omics data integration and be aware of methodology and a few main integrative techniques

 

Target audience

Master students, PhD students, postdocs, PIs

Pre-requisites

 

  • Beginner knowledge of R and/or Python and basic knowledge of mathematical statistics

 

Equipment to bring

A laptop would be helpful but not mandatory

Instructor(s):
Nikolay Oskolkov (main)

Go back


Microscopy Image Processing with Image J

Date: Monday 12 December 2022 9:00-12:00 13:00-16:00 Add to calendar

Room: Hox (Computer lab, room 3205) Kristine Bonnevieshus

 

 

Description

Optical Image Processing with ImageJ.

 

Learning outcomes

After the workshop, learner should understand the the principles of microscopic images and being able to generate imageJ macro script to perform automated image processing analysis.

 

Target audience

Master students, phd student and postdocs

Pre-requisites

 

  • Have basic understanding of molecular biology and some knowledge about fluorescence microscope.

 

Equipment to bring

nothing

Instructor(s):
Xian Hu (main)
Xian Hu, Felix Margadant, Kay Schink, Øyvind Fougner

Go back


Species occurrence data publication to GBIF

Date: Wednesday 07 December 2022 9:00-12:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)

 

 

Description

This workshop aims to publish species occurrences using the Integrated Publishing Toolkit (IPT) to GBIF. Biodiversity data published in this way follow Open data and FAIR data principles. We'll be using the Darwin Core data standard and get an overview of its structure. Participants are encouraged to bring their own data for publishing.

 

Learning outcomes

After attending this workshop, learners know how to publish species occurrences to GBIF using IPT and have an overview of Darwin core data standards as well as an overview of GBIF services.

 

Target audience

Anyone generating or using biodiversity data based on species (taxons or species concepts)

Pre-requisites

 

  • Data carpentry using any tool (R, Python, SQL, Excel, or other...)

 

Equipment to bring

laptop, biodiversity data they want to publish

Instructor(s):
Michal Torma (main)

Go back


Introduction to gene expression regulation by transcription factors and its computational analysis

Date: Tuesday 13 December 2022 9:00-12:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

Gene regulation by transcription factors (TFs) is a key process in cells, controlling when and where sets of particular genes will be transcribed. This precise control is important for cell growth and differentiation, but it also means that perturbation in such mechanisms can lead to diseases, e.g. cancer. Various experimental techniques were developed to map transcription factor binding locations along the genome. The most used nowadays is ChIP-seq. Resulting experimental data requires dedicated computational tools for data analysis and interpretation. A bouquet of such tools is already available, varying from peak callers and motif discovery tools to more complex deep learning algorithms. In this workshop we will introduce the transcription factors and how they are controlling gene expression. We will describe the properties of transcription factor binding, such as sequence binding motifs. Furthermore, we will discuss available in vivo and in vitro experimental techniques that help capture transcription regulation events. Finally, we will walk through the data analysis produced by different techniques. This will focus on TF motif discovery and enrichment, TF binding site determination. We will show a few freely available resources that store TF binding profiles and sites and demonstrate a selection of online tools for data analysis. The last part of the workshop will consist of Q&A (participants can bring their project-related questions) and a hands-on exercise.

 

Learning outcomes

After the workshop, the participants will have a comprehensive overview of gene expression regulation by transcription factors and how this kind of biological processes and experimental data can be investigated and interpreted using computational methods. The learners will also know some of the major resources storing relevant information and/or offering data analysis.

 

Target audience

Any career stage

Pre-requisites

 

  • No prerequisites, except interest in gene expression regulation by transcription factors and computational analysis of such data.

 

Equipment to bring

A laptop

Instructor(s):
Ieva Rauluseviciute (main)
Jaime A Castro-Mondragon

Go back


Introgression detection

Date: Wednesday 07 December 2022 13:00-16:00 Add to calendar

Room: Perl (room 2453) in Ole-Johan Dahls hus (OJD)

 

 

Description

Genome sequencing has revealed that many species have exchanged DNA with other species due to past hybridization – a process called "introgression". A number of methods have been developed to detect such past hybridization and introgression. Among these is the "ABBA-BABA test" and its test statistic, the D-statistic, which was first applied to a comparison of human and neanderthal genomes, and has since supported introgression in a large number of organisms. A fast implementation of this test and related statistics is the one in the program Dsuite, which will be taught in this workshop.

 

Learning outcomes

After attending this workshop, learners are able to detect the past occurrence of introgression between closely related species based on genomic data.

 

Target audience

Evolutionary biologists of any career stage

Pre-requisites

 

  • Experience with command-line tools is a prerequisite, and access to Saga would be useful.

 

Equipment to bring

A laptop, either with access to Saga, or with the program Dsuite (https://github.com/millanek/Dsuite) installed.

Instructor(s):
Michael Matschiner (main)

Go back


A hands-on introduction to NeLS and usegalaxy.no

Date: Thursday 08 December 2022 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

This workshop covers an introduction to the Norwegian e-Infrastructure for Life Sciences (NeLS) and usegalaxy.no. These services, both administrated by ELIXIR Norway, provide national users with data storage and the possibility to run analyses and define workflows in a reproducible way. The workshop will cover a general introduction to NeLS and its 3-layer tiered storage system with a specific focus on data transfer. A general introduction to transferring files between NeLS and usegalaxy will also be provided, followed by a dedicated session about the standard galaxy tools and "histories" for reproducible analysis. The final session will cover how to create and run workflows on usegalaxy and how to share them with other users. Each session will start with a short presentation, followed by a detailed hands-on session.

 

Learning outcomes

  • Transfer files from local machines to NeLS
  • Transfer files between usegalaxy and NeLS
  • Understand how to use galaxy tools for data analysis
  • Create galaxy "histories" for reproducible analysis
  • Create, run and share workflows on usegalaxy

 

Target audience

Researchers in life science/bioinformatics

Pre-requisites

 

  • Feide login credentials are required to log in. Participants who do not have access to such credentials should apply in advance for a NeLS identity (see nels.bioinfo.no)

 

Equipment to bring

laptop, Feide credentials or NeLS identity

Instructor(s):
Jon Lærdahl (main)
Jeanne Cheneby, Federico Bianchini

Go back


Statistical principles in machine learning for small biomedical data

Date: Tuesday 13 December 2022 9:00-12:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

A central problem in machine learning is how to make an algorithm perform well not just on the training data, but also on new inputs. Many strategies in machine learning are explicitly designed to reduce this test error, possibly at the expense of increased training error. These strategies are collectively known as regularisation and they are instrumental for good performance of any kind of prediction or classification model, especially in the context of small data (many features, few samples). We will discuss basic connected concepts of generalisation, overfitting, bias-variance trade-off and regularisation and will illustrate the principles with penalised (generalised) linear regression models, with ridge, lasso and elastic net penalties as prominent examples. Finally, we will present the idea of structured penalties and priors, which can be tailored to account for structures present in the data, e.g. multi-modality or complex correlation structures. We will use examples from large-scale cancer pharmacogenomic screens, where penalised regression and alternative Bayesian approaches are used for predicting drug sensitivity and synergy based on the genomic characterisation of tumour samples. In the hands-on tutorial we will use R to perform an integrated analysis of multi-omics data with penalised regression.

 

Learning outcomes

After attending this workshop, learners will understand key concepts for training machine learning models such as regularisation and how to incorporate data structure in the regularisation process.

 

Target audience

Graduate, post-graduate students and researchers at any level who are interested in applying machine learning methods to small data (few examples but potentially many features) or noisy data (e.g. biomedical data).

Pre-requisites

 

  • Basic familiarity with R. Introductory level statistics including regression.

 

Equipment to bring

Laptop with a recent version of R and RStudio installed.

Instructor(s):
Manuela Zucknick (main)
Theophilus Asenso (Oslo Centre for Biostatistics and Epidemiology, UiO); Chi Zhang (FHI)

Go back


Container technologies and their use in GWAS analysis

Date: Friday 09 December 2022 9:00-12:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

In this workshop, we will start with basic concepts of container technologies like Docker and Singularity followed by instructions on how to use them. Then, we will briefly touch on the basics of Genome-wide Association Studies (GWAS) and then we will examine how and why we can use containers in GWAS analysis.

 

Learning outcomes

  • Use Docker and Singularity Containers
  • learn the basics of Genome-wide Association Studies (GWAS)
  • perform GWAS analysis with containers

 

Target audience

Any level who is willing to learn how to perform GWAS analysis using containers.

Pre-requisites

 

  • basic knowledge of shell scripting
  • laptop (an UNIX based operating system is preferred but not necessary)
  • PREPARATION This is to get a Docker account and set up Docker in your system/browser to be able to run the commands in the workshop. I will strongly encourage you to do them before the workshop.
    • 0.1 Get an account (free) from Docker hub: https://hub.docker.com/
    • 0.2 Download Docker to your machine via: https://docs.docker.com/desktop/
    • (Alternative to 0.2) If you have already installed Docker you may skip this step. For those who are not able to install Docker on their machine, you can run Docker via browser using your Docker Hub account via: https://labs.play-with-docker.com/ (you can also run https://labs.play-with-docker.com/ via your terminal using ssh) But installing Docker to your machine is strongly recommended!!! (Running docker via the link above may crash sometimes) (end of the alternative to 0.2)
    • 0.3 For both cases type "docker --version" to check that you have docker installed

 

Equipment to bring

laptop (See the details in the prerequisites section))

Instructor(s):
Bayram Cevdet Akdeniz (main)

Go back


Introduction to Git and Development Cycle

Date: Monday 12 December 2022 13:00-16:00 Add to calendar

Room: Python (room 2269) in Ole-Johan Dahls hus (OJD)

 

 

Description

During software development when collaborating with other developers, it is essential to perform version control to ensure the codebase is maintained properly. This workshop aims at introduce Git in the practical development cycle.

 

Learning outcomes

After attending this workshop, learners are able to understand basic Git commands, branch management, remote management, fork, pull request and GitFlow.

 

Target audience

Anyone who would like to develop software.

Pre-requisites

 

  • basic knowledge of the Linux operating system, text editors or integrated development environment (IDE).

 

Equipment to bring

a laptop with Git (and optionally Gitflow) installed.

Instructor(s):
Ping-Han Hsieh (main)
Tatiana Belova

Go back


Clustering and plotting of single-cell expression data

Date: Thursday 08 December 2022 9:00-12:00 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

In this workshop we will explore the clustering and plotting of single-cell data based on gene expression. We will start with a theoretical introduction to the stages upstream of generating a Seurat object looking into model systems vs non-model systems, QC, cut-offs etc. Following is a hands-on RStudio session exploring the Seurat object focusing on data filtering, cell clustering based on gene expression, cell cycle regression, cluster annotation and data integration. Please note that this course will NOT cover analysis of multi-modal/multi-omics single-cell data.

 

Learning outcomes

After attending this workshop you will be familiar with the Seurat analysis pipeline for single-cell data in R. You should be able to perform a basic cell clustering based on gene expression, know what to look for in terms of data fit vs analysis strategy, and be aware of the limitations as well as possibilities of such analyses and data.

 

Target audience

This workshop is for anybody interested in single-cell expression analysis, regardless of career stage, who is relatively comfortable with using R as a tool. The level will be introductory with respect to analysis, but discussion around the various strategies demonstrated is encouraged as single-cell analysis follows no "standard recipe". You are welcome to discuss your own data/single-cell plans at the end of the workshop.

Pre-requisites

 

  • We do not expect you to be an expert R user for this workshop. However, you are expected to be familiar with R/Rstudio at a level where you have explored the use of common R packages such as ggplot2 and dplyr, are comfortable with using Bioconductor for package download/install, and comfortable with file import/export in the RStudio environment. Please note that there will be NO introduction to R during this workshop, and that you are expected to have the environment set up before the workshop starts.

 

Equipment to bring

Laptop of any kind but needs 16Gb RAM to perform optimally. Base R and R studio is required from at least version 4 - preferable 4.2 and up. Admin privileges will ease installation and environment setup.

Instructor(s):
Monica Hongrø Solbakken (main)
Anders Krabberød

Go back


Hands-on introduction to uniFAIR: a systematic and scalable approach to research data wrangling in Python

Date: Friday 09 December 2022 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

Life science researchers often need to extract, manipulate and integrate data and/or metadata from different sources, such as repositories, databases or flat files. Much research time is spent on trivial and not-so-trivial details of data wrangling: to reformat data structures; clean up errors; remove duplicate data; or map and integrate dataset fields. Software for data wrangling and analysis, such as Pandas, R or Frictionless, is useful, but researchers still regularly end up with hard-to-reuse scripts, often with manual steps. uniFAIR is a new Python library with a systematic and scalable approach to research data wrangling. With uniFAIR, researchers can import (meta)data in almost any shape or form: nested JSON; tabular (relational) data; binary streams; or other data structures. Through a step-by-step process, data is continuously parsed and reshaped according to a series of data model transformations. uniFAIR provides a catalog of generic task and subflow templates that the researcher can refine and apply to carry out the transformations needed to wrangle data into required shape. For large datasets, uniFAIR allows local test jobs on sample-sized data to be seamlessly scaled up to the full datasets and offloaded to external compute resources. Persistent access to the state of the data is available at every step. This workshop will introduce you to the technical and conceptual background needed to make use of uniFAIR, including the new type hints in Python. Participants will follow hands-on tutorials that are based on a series of use cases from genomics, proteomics, and machine learning.

 

Learning outcomes

  • Use type hints in Python in general and to define data models in uniFAIR/Pydantic
  • Understand the ideas behind the slogan "parse, don't validate"
  • Know the architecture of uniFAIR and its main classes, and have an overview of the different modules and their usage
  • Define, refine, apply and revise tasks and flows in uniFAIR
  • Import data from external REST APIs and flat files
  • Develop data transformation flows to solve a selection of use cases
  • Inspect data after each transformation step. Make informed choices on how to configure the next tasks.
  • Transform nested JSON output into normalized tables (without duplicate data)
  • Map (meta)data fields from the input data model to the user-defined output model These outcomes will be demonstrated and not hands-on due to time constraints:
  • Scale up the data import from a representative sample to a large dataset and deploy the flow on external compute resources (NIRD service platform)
  • Orchestrate flow runs using the Prefect web-based GUI and inspect data output from external runs
  • Get started with contributing to the Open Source catalog of uniFAIR modules

 

Target audience

PhDs, Postdocs, Technical personnel. Interest and experience with programming in an academic setting. Several of the use cases will assume bioinformatics experience, so a background in bioinformatics will help. Most of the databases and ontologies in the use cases are from the biological domain. However, Python programming experience is more important than a background in bioinformatics.

Pre-requisites

 

  • The participant should have at least an intermediate level of experience with Python programming. Experience with type hints in Python is useful, but not required.

 

Equipment to bring

Laptop. The participant should have installed an Interactive Development Environment (IDE). We recommend PyCharm, as this is what will be used in the demonstrations, but it is also allowed to install other IDEs, as long as it supports Python. Installation instructions will be provided.

Instructor(s):
Sveinung Gundersen (main)
Federico Bianchini, Jeanne Cheneby

Go back


Metagenomes meta-analysis: bioinformatic ecosystem from idea to data

Date: Wednesday 07 December 2022 13:00-16:00 Add to calendar

Room: Prolog (room 2465) in Ole-Johan Dahls hus (OJD)

 

 

Description

In recent years, various large-scale research projects delivered increasing-well standardised metagenomic sequencing data to the public domain. Often times, such data is generated for pioneering or to address broad research questions, offering rich untapped information for meta-analyses and in-depth studies addressing more specific questions. First, we will explore such project's and realise that many data features can be exploited. Smaller projects also yield data that can be combined for insightful meta-analyses, which consist of collecting samples from independent sources and process them altogether to address new question, usually to test reproducibility or empower statistics with large sample sizes. For example, one might want to understand universal mechanisms by integrating soil with ocean samples from a given range of pH range, or to assess where genome lineages from a mass of air branch in a phylogeny containing genomes from underlying surface oceans. For this to be possible, meta-analyses both rely on and serve the quintessential purpose of standardization and the FAIR principles. It is important to interrogate metadata information to identify relevant samples before collecting all the sequence data files of a study. Thus, we will learn how to query public data repositories using different entry points (online and programmatically) and insist on metadata quality and content using python and visualization tools to measure potential for useful meta-analysis. The rest is simple engineering: we will use the NCBI SRA-Toolkit to download and format raw sequence data for bioinformatic processing and briefly discuss the technical challenges and avenues for metagenomics.

 

Learning outcomes

After this half-day workshop, participants will know:
  • what are the major metagenomic projects and public data repositories;
  • how to interrogate these repositories (NCBI's SRA and EBI's ENA, Qiita);
  • how to collect sample metadata and select relevant samples
  • how to merge metadata information from different studies and assess potential for meta-analysis
  • how to retrieve sequencing data using sratools
  • what are the challenges for a robust meta-analysis and how to proceed for the sequence analysis

 

Target audience

This workshop is not intended for experienced computational scientists but rather for Master and PhD level biologists as well as senior biologists with little experience with R/python, willing to learn about accessing and collecting public metagenomic data.

Pre-requisites

 

  • Ideally, workshop participants already know how to:
  • navigate their filesystem using basic unix commands (cd, ls, mkdir, ...)
  • open a jupyter notebook (installation guidelines will be provided)
  • use basic python (load libraries, incl. pandas)
  • use basic R (R Studio)

 

Equipment to bring

laptop with jupyter notebook installed, ca. 20GB of storage

Instructor(s):
Franck Lejzerowicz (main)

Go back

Published Oct. 10, 2022 2:35 PM - Last modified Dec. 1, 2022 1:58 PM