LLMpediaThe first transparent, open encyclopedia generated by LLMs

The Carpentries

Generated by GPT-5-mini
Note: This article was automatically generated by a large language model (LLM) from purely parametric knowledge (no retrieval). It may contain inaccuracies or hallucinations. This encyclopedia is part of a research project currently under review.
Article Genealogy
Expansion Funnel Raw 120 → Dedup 8 → NER 1 → Enqueued 1
1. Extracted120
2. After dedup8 (None)
3. After NER1 (None)
Rejected: 7 (not NE: 7)
4. Enqueued1 (None)
The Carpentries
NameThe Carpentries
Formation1998
TypeNon-profit organization
Region servedInternational

The Carpentries is an international nonprofit organization that develops and teaches data and software skills for researchers and professionals. Founded to address reproducibility and efficiency needs in scientific research, the organization delivers workshops, instructor training, and open curricula focused on computational tools. Its activities engage communities across academia, government laboratories, and industry, promoting best practices in data management and coding.

History

The organization's origins trace to the late 1990s and early 2000s initiatives in computational literacy at institutions such as University of California, Berkeley, Carpentry Community Project, and workshops inspired by efforts at National Institutes of Health, European Bioinformatics Institute, and Lawrence Berkeley National Laboratory. Early influences include programs at Software Carpentry, Data Carpentry, and the Mozilla Science Lab, which responded to needs expressed by groups at Stanford University, Massachusetts Institute of Technology, University of Cambridge, Harvard University, California Institute of Technology, and Princeton University. Key milestones involved collaborations with entities such as Wellcome Trust, Gordon and Betty Moore Foundation, National Science Foundation, and Howard Hughes Medical Institute, which supported expansions in pedagogy and reach. The evolution saw consolidation of curricula and governance structures influenced by practices at Carnegie Mellon University, University of Oxford, Yale University, ETH Zurich, Max Planck Society, and University of Toronto. International growth included partnerships with organizations in Australia, Canada, Germany, France, Japan, South Africa, India, Brazil, and China.

Organization and Governance

The organization operates with a distributed governance model integrating a central executive team and volunteer governance bodies modeled on examples from Apache Software Foundation, Wikipedia, and Linux Foundation. A Board of Directors provides fiduciary oversight and strategic direction, drawing expertise from leaders affiliated with National Institutes of Health, European Research Council, Wellcome Trust, Amazon Web Services, and Google. Operational teams manage curriculum, training, community engagement, and infrastructure, with roles similar to those at Open Knowledge Foundation, Creative Commons, and Open Science Framework. Local and regional groups coordinate workshops and events, mirroring federated models used by Society for Industrial and Applied Mathematics, American Statistical Association, and Association for Computing Machinery chapters. Advisory committees include representatives from University of Washington, Columbia University, Duke University, Imperial College London, McGill University, and research centers such as Los Alamos National Laboratory.

Curriculum and Instructional Model

The curriculum comprises modular lessons on tools and practices drawn from computational workflows in settings like Genomics, Astronomy, Ecology, Neuroscience, and Social Science Research. Core lessons teach skills using languages and platforms associated with Python (programming language), R (programming language), SQL, Bash (Unix shell), and version control systems such as Git and platforms like GitHub. Pedagogical approaches reflect methods developed at Carpentry Wheelhouse, Bloom's taxonomy implementations in higher education, and active learning techniques used in MIT OpenCourseWare and HarvardX. Lessons are maintained as openly licensed repositories hosted with practices similar to those at GitLab and Zenodo, enabling contributions from community members affiliated with European Molecular Biology Laboratory, Sanger Institute, Max Planck Institutes, CSIRO, and NASA. Instructional materials emphasize reproducible research workflows akin to initiatives by Reproducible Research advocates at Stanford Medicine, PLOS, and Nature Research.

Instructor Training and Community

Instructor training follows a workshop-based credentialing model influenced by programs at Teach the Teachers initiatives and professional development models used by American Association of Universities and Society for Neuroscience. Prospective instructors attend training that covers pedagogy, inclusive teaching, and technical facilitation, drawing on evidence from studies at University College London, University of Edinburgh, and University of Michigan. A network of certified instructors includes volunteers from National Institutes of Health, European Space Agency, Wellcome Sanger Institute, Broad Institute, Fred Hutchinson Cancer Research Center, and universities such as University of California, San Diego and Imperial College London. Community governance incorporates mailing lists, discussion forums, and contribution workflows similar to those at Stack Overflow, Discourse, and GitHub Issues, supporting coordination among maintainers from University of British Columbia, ETH Zurich, Seoul National University, University of Cape Town, and Pontifical Catholic University of Chile.

Impact and Adoption

Adoption spans research laboratories, libraries, and campus units at institutions including Harvard University, University of California, Berkeley, University of Toronto, University of Minnesota, Monash University, and national research organizations like National Aeronautics and Space Administration, European Space Agency, and Canadian Institute for Advanced Research. Evaluations of workshop outcomes draw on assessment methods from Cochrane, RAND Corporation, and field studies published in venues such as PLOS ONE, Nature Communications, and Science Advances. Impact metrics include numbers of workshops, trained instructors, and community contributions tracked by platforms used by Zenodo and Figshare. Case studies cite improvements in reproducibility and productivity in projects at Sanger Institute, Broad Institute, Max Planck Institute for Biophysical Chemistry, Los Alamos National Laboratory, and CERN.

Funding and Partnerships

Funding sources include philanthropic foundations, government grant programs, and institutional subscriptions, modeled after partnerships between Wellcome Trust, Gordon and Betty Moore Foundation, National Science Foundation, European Commission, and corporate sponsors such as Microsoft, Google, and Amazon Web Services. Collaborative partnerships and memoranda of understanding have been established with libraries and research support bodies like Association of Research Libraries, Digital Science, Jisc, DataCite, and ORCID. Strategic alliances with infrastructure projects and consortia—including ELIXIR, Global Biodata Coalition, HathiTrust, and Research Data Alliance—support interoperability and community growth.

Category:Educational organizations