Diffusion Models: From Image to Biological Sequence Generation

January 30 & 31, 2024
9 AM to 5 PM
G9.102

Diffusion generative models such as Stable Diffusion have achieved remarkable results in generating images, videos, and so on. This two-day course explores the key principles behind diffusion models.

During the first part of course, participants will gain a theoretical understanding behind the original diffusion models and become familiar with score-based generative stochastic differential equation models. Participants will be having an opportunity to implement the first diffusion model and generate images. During the second part, we will depart from image generation and will venture to biological sequence generation by studying several state-of-art diffusion models.

As a result of the course, participants will learn how to implement diffusion models and how to generate various data modalities including images and DNA sequences.

Day 1: diffusion model - DDPM, score-based stochastic differential equation model, image generation, diffusion model - EDM
Day 2: bit diffusion, dirichlet diffusion score model, DNA sequence generation, last developments of diffusion generative models

Course outcomes & objectives:

Learn main ideas behind diffusion models
Learn best practices to achieve state-of-art performance in image and DNA sequence generation
Gain a practical understanding of various choices in designing diffusion models.

This nanocourse requires basic knowledge of deep learning and the ability to develop and train own models using PyTorch on GPU.

For UTSW graduate students and PostDocs, academic credit (1 credit hour) is available. Grad Students: BME 5096-05, PostDocs: PDRT 5095-02

Course Director: Jian Zhou

Instructor: Pavel Avdeyev

Spring 2024

Single Cell Genomics

January 22 & 23, 2024
9 AM to 5 PM
G9.102

This course covers the basics of single-cell technologies and computational analysis. We will provide overviews and key algorithms for single-cell RNA-Seq, single-cell ATAC-Seq, and multiome analysis. This course includes hands-on practice to perform analyses from raw data to quality control, clustering, visualization, and trajectory inference. It also includes more advanced topics including multiome analysis, spatial transcriptomics, and single-cell perturbation.

This course requires proficiency with R and Python.

For UTSW graduate students and PostDocs, academic credit (1 credit hour) is available. Grad students: BME 5096-04, PostDocs: PDRT 5095-01

Course Director: Genevieve Konopka

Instructors: Yihan Wang, Xiongbing Kang, Gözde Büyükkahraman
Other instructors: Gary Hon, Tao Wang, Ashwinikumar Kulkarni

Multiplexed NGS Assays and Analysis: From FASTQ to Fitness

February 27 & 28, 2024
9 AM to 5 PM
G9.102

Multiplexed assays of variant effects (MAVEs), e.g., deep mutational scanning, are increasingly becoming the standard from which to quantify phenotypic effects from genotypic perturbations. This includes protein mutagenesis, gene expression regulation, protein-protein interactions, and molecular evolution. In light of the expanding application of machine learning towards modeling these interactions, it is important to be able to design, implement, and analyze these experiments that can produce large volumes of high-quality data. In this course, students will be 1) given an overview of how to plan and implement a massively parallel reporter assay (MPRA), 2) shown how to understand the results of an Illumina based NGS run in the context of a deep mutational scanning experiment, and 3) build a pipeline to process these raw results into variant fitness and error estimates using either their own data, or provided examples. Students do not need to have an existing MAVE experiment or a strong background in coding.

For UTSW graduate students and PostDocs, academic credit (1 credit hour) is available. Grad students: BME 5096-06, PostDocs: PDRT 5095-03

Course Director: Kimberly Reynolds

Instructors: James McCormick, Ryan M. Otto, Philip M. Brown

Deep Learning for Beginners

Since this nanocourse has been rescheduled from Fall 2023, we will not be putting out a call for new applications.
Please note that our course description, outcomes, and objectives have been comprehensively updated.

February 13 & 14, 2024
9 AM to 5 PM
G9.250A

This course is intended to provide a theoretical as well as practical introduction to Deep Learning. This is not a boiler plate presentation of Deep Learning as widely accessible through online courses. Instead, we hope attendees will take away a deeper understanding of the motivation of implementing neural networks for data modeling and the consequential complexities in formulating the underlying optimization problem. We will then make the critical step towards convolutional neural networks (CNNs), which permit a multiscale analysis of data. We will also offer a balanced discussion of the strengths and weaknesses of Deep Learning vis-à-vis conventional Machine Learning approaches. We will first introduce the intuition and computational underpinnings of Deep Learning, followed by hands on sessions, training attendees on practical approaches to implementing Deep Learning in Pytorch. The entire course revolves around the conceptually simple problem of two-class data classification. See syllabus for a preview of the course content. The course is targeted at biomedical researchers with no prior machine learning experience, yet a keen curiosity in the mathematical and computational of Deep Learning.

Competence in (python) programming is required.

Course outcomes & objectives:
1. Understand the core elements of data modeling with neural networks.
2. Understand the power of learning convolution kernels for data modeling.
3. Learn how to implement a deep learning pipeline in python.
4. Understand why deep learning methods are able to perform so well and identify situations where they are likely to outperform (or underperform!) classical machine learning approaches.
5. Gain a practical understanding of various choices in designing and validating a deep learning model.

For UTSW grad students & PostDocs, academic credit (1 credit hour) is available. Grad students: BME 5096-10, PostDocs: PDRT 5095-05

Course Director: Satwik Rajaram, PhD
Instructors: Gaudenz Danuser, PhD, Thuong Nguyen, Ph.D., Hongqing Han, Ph.D.

Scientific Reproducibility with Containers

March 26 & 27, 2024
9 AM to 5 PM
G9.102

This is a certification course. The participants will be awarded certification upon completion of both days of the course. There is no academic credit.

Reproducibility is essential in the course of scientific research and development. It serves as proof that an established and documented work can be verified and reproduced, thus improving the confidence of others in the quality of your work. In the realm of software and computational methods, software containers are an important and increasingly ubiquitous approach for reproducible computing. They allow a given process, tool, or workflow to be packaged, shared, and reliably run in a variety of different locations, enabling easier sharing of novel code or tools.

This two-day workshop will provide comprehensive instruction and hands-on training in various topics including:

Fundamentals of Containers - what they are, how to use them, and how to create them.
Git and GitLab as tools to streamline the development of containers.
Scientific workflows using Astrocyte. Aimed at bioinformaticians, this segment introduces the structure of Astrocyte packages, Nextflow as a workflow language, and other related concepts.
Building advanced machine learning applications in a simple and portable way.

Participants will engage in hands-on exercises covering container operations on BioHPC, developing containers using Git and GitLab, and crafting Astrocyte pipelines using containers.

This course requires familiarity with BioHPC usage and a fundamental understanding of Linux.

Course Director: Liqiang Wang

Instructors: Merve Apalak, Ramcharan Chandrashekhar, Felix Perez, Yunhui Fu, Devin O'Kelly, Peng Lian, Xueyan Li and Suresh Pannerselvam.

Programming for Beginners (with MATLAB)

April 10 & 11, 2024
9 AM to 5 PM
G9.250A

This course would be useful for students with an interest in learning the most elementary steps in software programming. The course will use MATLAB as the programming platform, but the coding elements taught are fully agnostic to the programming language. The goal of the course is not to teach MATLAB, but to break down for the novice the mystery of coding and to illustrate the basic thinking behind structuring a set of instructions to produce something intelligible. Students will learn how to write and read simple codes and how to evaluate the progression of a program sequence, both numerically as well as through graphical representations of intermediate and final results. As a final project, students will have a choice of programming a classic algorithm for data clustering or a classic algorithm for the simulation of biochemical reactions.
Day 1: Elementary set of commands (ops on arrays/matrices; loops; decisions), Programming interface including debugging scripts vs functions; variable name space, Benchmark test: ability to read a piece of code
Day 2: Plotting including dynamic plots, Random number generation Example problem: calculate pi using a randomized 'droplet fall' on circular area, Benchmark tests for programming: k-means or Gillespie algorithms

For UTSW graduate students and PostDocs, academic credit (1 credit hour) is available. Grad students: BME 5096-07, PostDocs: PDRT 5095-04

Course Director: Qiongjing (Jenny) Zou

Instructors: Gaudenz Danuser, Hanieh Mazloom-Farsibaf

Diffusion Models: From Image to Biological Sequence Generation

January 30 & 31, 20249 AM to 5 PMG9.102