BioC Summer School

BioC Summer School
Name	BioC Summer School
Status	Active
Genre	Workshop
Frequency	Annual
Location	Varies
First	2010s
Organizer	Bioconductor Project

Contents

Overview
History and Development
Curriculum and Topics
Organizers and Partners
Participants and Eligibility
Format and Locations
Impact and Outcomes

BioC Summer School BioC Summer School is an annual intensive training program for computational biology, bioinformatics, and biostatistics professionals held alongside academic and research institutions such as University of California, Berkeley, Stanford University, University of Cambridge, Harvard University, and University of Oxford. It brings together developers, researchers, and students from organizations including European Molecular Biology Laboratory, National Institutes of Health, Wellcome Trust Sanger Institute, Broad Institute, and European Bioinformatics Institute to teach reproducible analysis using the Bioconductor project, R (programming language), and interoperable software stacks like Bioconda and Docker. The program emphasizes open science, reproducible workflows, and community-driven package development with connections to conferences such as RECOMB, ISMB, Gordon Research Conferences, RSNA, and NeurIPS.

Overview

BioC Summer School trains attendees in high-throughput data analysis, leveraging tools developed by projects such as Bioconductor, R Foundation, Biopython, Bioconductor core team, Bioconductor package authors, and institutions like Cold Spring Harbor Laboratory, Max Planck Institute, Institut Pasteur, Centers for Disease Control and Prevention, and European Molecular Biology Organization. Courses cover work with datasets from platforms produced by companies and consortia including Illumina, Thermo Fisher Scientific, 10x Genomics, Human Genome Project, ENCODE Project Consortium, 1000 Genomes Project, and The Cancer Genome Atlas. Instruction integrates version control and collaboration tools such as GitHub, GitLab, Bitbucket, Zenodo, and Zenodo community for data citation, alongside containerization via Docker (software), orchestration via Kubernetes, and workflow managers like Nextflow, Snakemake, CWL, and Galaxy (platform).

History and Development

The program emerged from community workshops around the Bioconductor ecosystem, influenced by earlier training hubs like EMBL-EBI Training, Cold Spring Harbor Laboratory courses, Carnegie Mellon University summer schools, MIT Summer Research Program, and initiatives by funders such as the National Science Foundation, National Human Genome Research Institute, European Research Council, Wellcome Trust, and Gates Foundation. Founding contributors included developers associated with R (programming language), key authors of packages like edgeR, limma, DESeq2, GenomicRanges, and representatives from Bioconductor core team and research groups at Johns Hopkins University, University of Washington, Fred Hutchinson Cancer Research Center, Salk Institute, and McGill University. Over time the school incorporated practices from Software Carpentry, Data Carpentry, Mozilla Science Lab, and professional societies such as American Society for Biochemistry and Molecular Biology and American Medical Informatics Association.

Curriculum and Topics

Sessions address statistical and computational methods used in high-throughput biology, referencing packages and concepts connected to limma, DESeq2, edgeR, GenomicRanges, SingleCellExperiment, Seurat, Scanpy, Monocle, SCRAN, and BiocGenerics. Lectures link to experimental technologies and consortia like RNA-Seq, ChIP-seq, ATAC-seq, Hi-C, Mass spectrometry, single-cell RNA-seq, spatial transcriptomics, CRISPR (genome editing), and consortia such as ENCODE Project Consortium and GTEx. Practical modules use tools and infrastructures like RStudio, Bioconductor, BiocManager, Bioconductor Workflow, BiocViews, Bioconductor Package development, GitHub Actions, and continuous integration services including Travis CI and GitHub Actions. Advanced topics reference machine learning and statistical frameworks including TensorFlow, PyTorch, scikit-learn, caret (R package), lme4, Stan (software), and integrative resources like KEGG, Reactome, Gene Ontology, UniProt, and Ensembl.

Organizers and Partners

Organizing teams typically include members from the Bioconductor project, academic hosts such as University of Pennsylvania, University of California, San Diego, University of Melbourne, University of British Columbia, and funding or partnership from organizations like National Institutes of Health, European Molecular Biology Laboratory, Wellcome Trust, Chan Zuckerberg Initiative, Gordon and Betty Moore Foundation, Alan Turing Institute, and industry partners including Illumina, 10x Genomics, Thermo Fisher Scientific, Amazon Web Services, and Microsoft Research. Collaborations extend to training initiatives like Software Carpentry, Data Carpentry, The Carpentries, Mozilla Science Lab, and community events such as Hackathons hosted at venues including European Bioinformatics Institute and Broad Institute.

Participants and Eligibility

Attendees range from graduate students enrolled at institutions like University of Toronto, University College London, Imperial College London, Peking University, Tsinghua University, Seoul National University, and National University of Singapore to postdoctoral researchers and professionals from research centers such as Memorial Sloan Kettering Cancer Center, Mayo Clinic, Karolinska Institutet, Ragon Institute, and biotech firms like Genentech and Biogen. Selection criteria often emphasize experience with R (programming language), familiarity with sequencing data from platforms by Illumina and 10x Genomics, and goals aligned with reproducible research promoted by The Carpentries and Open Science Framework. Scholarships and travel awards are sometimes supported by funders such as National Institutes of Health, Wellcome Trust, and European Research Council.

Format and Locations

The summer school runs as week-long intensive sessions combining lectures, hands-on labs, coding sprints, and capstone projects hosted at universities, research institutes, and conference centers across regions including North America, Europe, Asia, Australia, and Africa. Venues have included University of California, Berkeley, European Molecular Biology Laboratory–European Bioinformatics Institute, Harvard Medical School, University of Cambridge, Wellcome Sanger Institute, and Max Planck Institute for Molecular Genetics. Hybrid and remote formats use platforms such as Zoom, Slack (software), Mattermost, and collaborative notebooks like R Markdown, Jupyter Notebook, JupyterLab, and Google Colaboratory.

Impact and Outcomes

Outcomes include trained contributors to repositories like Bioconductor package, increased reproducible workflows adopted by projects such as ENCODE Project Consortium and GTEx, and collaborations leading to publications in journals such as Nature, Science, Nature Methods, Genome Research, Bioinformatics (journal), and Nucleic Acids Research. Alumni have advanced work at institutions like Broad Institute, European Bioinformatics Institute, National Institutes of Health, Fred Hutchinson Cancer Research Center, Scripps Research, and companies including Illumina and 10x Genomics, and have influenced standards in data sharing at platforms like ArrayExpress, GEO (Gene Expression Omnibus), Zenodo, and Figshare. The program has contributed to community resources including package development, training materials, and curriculum adopted by networks such as The Carpentries and regional training hubs like EMBL-EBI Training.

Category:Bioinformatics education