Generated by GPT-5-mini| Scanpy | |
|---|---|
| Name | Scanpy |
| Programming language | Python |
| Operating system | Cross-platform |
Scanpy Scanpy is a Python-based toolkit for large-scale single-cell gene expression analysis used in computational biology, bioinformatics, and genomics. It integrates methods from statistical learning, matrix computation, and visualization to process single-cell RNA sequencing datasets, supporting workflows from preprocessing to clustering and trajectory inference. Scanpy is widely adopted alongside tools from Broad Institute, European Bioinformatics Institute, Wellcome Sanger Institute, Allen Institute for Brain Science, and research groups at institutions such as MIT, Stanford University, Harvard University, University of Cambridge, Max Planck Society, ETH Zurich, University of California, Berkeley, Princeton University, Yale University, University of Oxford, Karolinska Institutet, University of Toronto, Cold Spring Harbor Laboratory, University of Washington, University of California, San Diego, Columbia University, University of Chicago, Johns Hopkins University, University of Pennsylvania, McGill University, Imperial College London, University College London, Duke University, UCLA, University of Michigan, University of British Columbia, University of Freiburg, Heidelberg University, University of Edinburgh, University of Copenhagen, National Institutes of Health, European Molecular Biology Laboratory, Institut Pasteur, Barcelona Supercomputing Center, Ragon Institute, Salk Institute, Washington University in St. Louis, University of Zurich, University of Basel, Monash University, University of Melbourne, Seoul National University, Peking University, Tsinghua University, University of Tokyo, RIKEN, Osaka University, KAUST, ETH Zürich, Technical University of Munich, CERN, NASA, Google, Amazon Web Services, Microsoft Research, NVIDIA, Intel, IBM Research, Facebook AI Research, DeepMind, OpenAI, KTH Royal Institute of Technology, Ludwig Maximilian University of Munich, University of Leiden, University of Amsterdam, University of Groningen, KU Leuven, Ghent University, University of Antwerp, University of Milan, University of Bologna, Politecnico di Milano, École Polytechnique, Sorbonne University, CNRS, University of Paris Saclay, University of Barcelona, University of Valencia, Technical University of Denmark, Chalmers University of Technology, Aarhus University, Trinity College Dublin, National University of Singapore, Nanyang Technological University, University of Hong Kong.
Scanpy provides scalable algorithms for single-cell transcriptomics that support datasets with hundreds of thousands to millions of cells, and interoperates with visualization, clustering, and trajectory tools developed at research centers like Broad Institute, Wellcome Sanger Institute, and European Bioinformatics Institute. It leverages computational libraries created by projects at NumPy, SciPy, Pandas, Matplotlib, Seaborn, scikit-learn, HDF5, Zarr, and ecosystem contributors from PyPI, CondaForge, GitHub, Bitbucket, GitLab, and continuous integration services used by Travis CI and GitHub Actions. Prominent applications include analyses published in journals affiliated with Nature Publishing Group, Cell Press, Science, PNAS, eLife, Genome Research, and datasets from initiatives like the Human Cell Atlas, ENCODE Project, GTEx Project, 1000 Genomes Project, Cancer Genome Atlas, NIH Roadmap Epigenomics Project, Allen Brain Atlas.
Scanpy implements preprocessing steps such as normalization, log-transformation, and highly variable gene selection, often used in studies from Broad Institute, Harvard Medical School, Stanford Medicine, Dana-Farber Cancer Institute, and Memorial Sloan Kettering Cancer Center. It offers dimensionality reduction methods including principal component analysis and UMAP, comparable to implementations produced by teams at Facebook AI Research, UMAP community, scikit-learn, and algorithmic work from Maaten and Hinton style t-SNE variants; it integrates clustering algorithms like Leiden and Louvain developed in projects from Leiden University, University of Amsterdam, and Vrije Universiteit Amsterdam. Visualization modules produce embeddings, heatmaps, and dot plots used in collaborative efforts with Broad Institute, Sanger Institute, Harvard, and MIT groups. Scanpy supports neighborhood graph construction, differential expression testing, batch correction routines inspired by methods from Seurat, Harmony, MNN Correct, and aligns with trajectory inference approaches linking to developments from Monocle, Slingshot, PAGA, and STREAM.
Scanpy centers on the AnnData data structure, interoperable with formats and tools from HDF5, Zarr, loompy, and projects at Bioconductor that use SingleCellExperiment conventions. AnnData enables storage of expression matrices, cell and gene annotations, and reduced dimensions; it is designed for compatibility with backends such as NumPy, SciPy.sparse, and Pandas DataFrame. File serialization aligns with community standards influenced by consortia like the Human Cell Atlas and infrastructures from European Genome-phenome Archive and NCBI, facilitating exchange with platforms like Galaxy Project, Terra, DNAnexus, and cloud providers including Google Cloud Platform, Amazon Web Services, and Microsoft Azure.
A typical Scanpy pipeline mirrors workflows described in publications by Human Cell Atlas, Broad Institute, Wellcome Sanger Institute, and academic groups at Stanford, Harvard, and MIT: quality control and filtering, normalization, highly variable gene selection, dimensionality reduction, neighborhood graph construction, clustering, marker gene identification, and visualization. Integrative analyses combine batch correction methods from Seurat (Satija Lab), Harmony (Korsunsky Lab), and alignment techniques used by researchers at Sanger Institute and EGA contributors. Downstream steps often include trajectory analysis with tools from Monocle (Trapnell Lab), PAGA (Wolf Lab), or integration into pipelines at Galaxy Project and collaborative platforms like Bioconductor.
Scanpy is extended via plugins and interoperable packages developed within communities at GitHub, PyPI, CondaForge, and research labs including CZI-funded groups, contributors from EMBL-EBI, Wellcome Trust, and vendors like 10x Genomics, Illumina, Pacific Biosciences, Oxford Nanopore Technologies. Notable companion tools and integrations include wrappers for algorithms from Seurat, Monocle, PAGA, visualization tools from Plotly, Bokeh, and single-cell atlasing efforts from Human Cell Atlas, Allen Institute for Brain Science, Cancer Research UK, and Stanford Medicine.
Scanpy development is coordinated on platforms such as GitHub with contributions from academic groups at Max Planck Society, EMBL, Broad Institute, Wellcome Sanger Institute, Harvard University, MIT, Stanford University, ETH Zurich, and supported by funding bodies like European Research Council, Wellcome Trust, NIH, NSF, and philanthropic organizations including Chan Zuckerberg Initiative. Licensing and governance are managed according to open-source norms prevalent on GitHub and PyPI with community engagement via forums, conference presentations at meetings such as ISMB, RECOMB, Gordon Research Conferences, Cold Spring Harbor Laboratory meetings, and workshops at EMBO and Keystone Symposia. Developers and users interact in mailing lists, issue trackers, and community channels associated with institutions like BioConductor, Galaxy Project, and consortia such as the Human Cell Atlas.
Category:Bioinformatics software