PLINK (software) — LLMpedia

PLINK (software)
Name	PLINK
Developer	Shaun Purcell; Christopher Chang; Mark Daly; Benjamin Neale
Released	2007
Latest release version	1.90 / 2.0
Programming language	C / C++
Operating system	Linux / macOS / Windows
License	GNU General Public License (1.0 / 3.0 variants)

Contents

Overview
Features and Functionality
File Formats and Data Input
Algorithms and Statistical Methods
Performance and Scalability
Development, Licensing, and Community
Applications and Use Cases

PLINK (software) is a widely used open-source toolset for whole-genome association analysis and population-based linkage analyses. It was created to support large-scale analyses of genotype/phenotype data arising from projects such as the International HapMap Project, the 1000 Genomes Project, and genome-wide association studies (GWAS) conducted by institutions like the Broad Institute, the Wellcome Trust, and the National Institutes of Health. PLINK is commonly cited alongside tools and resources such as GATK, BCFtools, PLATO, and EIGENSOFT in genetics and bioinformatics literature.

Overview

PLINK originated in the context of collaboration between the Broad Institute, the Massachusetts Institute of Technology, and Harvard University, with contributions by Shaun Purcell, Christopher Chang, Mark Daly, and Benjamin Neale. It was developed to address computational needs identified during consortia including the International HapMap Project, the Wellcome Trust Case Control Consortium, and the 1000 Genomes Project. The software occupies a central role in pipelines that also integrate tools like BEAGLE, IMPUTE2, SHAPEIT, and SAMtools for genotype imputation, phasing, and variant calling used by groups such as the UK Biobank, the European Bioinformatics Institute, and the National Human Genome Research Institute.

Features and Functionality

PLINK provides functions for data management, summary statistics, association testing, population stratification, and identity-by-descent estimation. Typical workflows combine PLINK with tools like GCTA, EIGENSOFT, METAL, and RVTESTS for mixed-model association, principal component analysis, meta-analysis, and rare-variant tests employed in studies by consortia such as the Psychiatric Genomics Consortium, the Alzheimer’s Disease Sequencing Project, and the Cancer Genome Atlas. Key capabilities mirror analyses performed in projects at the Broad Institute, Wellcome Trust Sanger Institute, Stanford University, and Johns Hopkins University.

File Formats and Data Input

PLINK supports binary PED formats (.bed/.bim/.fam), standard PED/MAP text formats, and conversion from VCF produced by GATK, BCFtools, and FreeBayes. It inter-operates with data output from sequencing centers like the Broad Institute's Genomics Platform, the Sanger Institute Sequencing Facility, and EMBL-EBI resources. PLINK input is often prepared using pipelines that include Picard, SAMtools, and Nextflow developed in labs at MIT, UC Berkeley, Harvard Medical School, and the Broad Institute.

Algorithms and Statistical Methods

PLINK implements statistical tests such as case-control chi-square, logistic regression, linear regression, transmission disequilibrium test, and identity-by-descent estimation based on methods referenced in literature from authors like David Balding, Peter Donnelly, and Neil Risch. It supports principal component analysis for population stratification akin to methods in EIGENSOFT developed by Nick Patterson and David Reich, and mixed-model approximations comparable to algorithms in EMMA and GCTA created by Jian Yang and colleagues. PLINK’s LD pruning, haplotype frequency estimation, and Hardy–Weinberg equilibrium tests are standard in studies performed by the Wellcome Trust Case Control Consortium, the International HapMap Consortium, and the 1000 Genomes Project.

Performance and Scalability

Designed for large cohort studies such as the UK Biobank, the Million Veteran Program, and international consortia, PLINK emphasizes memory efficiency and speed. Implementations in C/C++ allow integration into high-performance computing environments at institutions like the Broad Institute, NIH HPC, and national supercomputing centers. Parallelization strategies and binary formats enable processing of millions of variants across hundreds of thousands of samples, comparable to workflows using Hadoop or SLURM clusters at the European Grid Infrastructure and XSEDE centers.

Development, Licensing, and Community

PLINK’s development has been guided by academic groups at the Broad Institute, Massachusetts General Hospital, and Harvard Medical School, with versions maintained under GNU licenses and contributions from researchers affiliated with institutions such as Stanford University, the University of Oxford, and the University of Cambridge. The software’s ecosystem includes user communities and mailing lists connected to conferences like the American Society of Human Genetics, the European Society of Human Genetics, and workshops organized by EMBL-EBI and the Wellcome Genome Campus.

Applications and Use Cases

PLINK is applied extensively in genome-wide association studies conducted by the Psychiatric Genomics Consortium, Alzheimer’s disease consortia, cardiovascular genetics consortia at the Framingham Heart Study, and cancer genomics studies linked to The Cancer Genome Atlas. It is used in population genetics analyses undertaken by groups studying human migrations, ancient DNA projects affiliated with the Max Planck Institute for the Science of Human History, and evolutionary studies involving the Smithsonian Institution and the Natural History Museum. Typical use cases include quality control workflows, principal component analyses for stratification correction, single-variant association testing for complex traits, and preparation of inputs for meta-analysis with METAL or fine-mapping with CAVIAR and FINEMAP.

Category:Bioinformatics software Category:Genetics software Category:Open-source software