Challenge to the reader: Pick one tool from each of the 19 sections below that you haven’t used before. By the end of this post, you should have a list of 19 new tools to explore. For each one, write down one research question it could help you answer.
Modern cancer research runs on open-source software. The TCGA alone produced petabytes of multi-omics data across 20,000+ tumors, and every single analysis pipeline that made sense of it was built on tools hosted on GitHub. But the ecosystem is vast, fragmented, and hard to navigate — even for experienced researchers.
This post is a curated map of approximately 200 of the most important GitHub repositories actively used in modern cancer research pipelines, organized by real research workflow layers: genomics, transcriptomics, single-cell biology, immuno-oncology, AI drug discovery, and clinical informatics. It reflects how actual cancer research pipelines are structured, not a random dump of links.
1. Core Cancer Genomics Pipelines (Foundational)
These are the backbone tools used in TCGA-style pipelines.
Variant Calling (DNA Mutations)
- broadinstitute/gatk
- gatk-workflows/gatk4-germline-snps-indels
- broadinstitute/mutect2
- Illumina/strelka
- Illumina/manta
- samtools/bcftools
- samtools/samtools
- brentp/freebayes
- jts/samtools
- genome/bamtools
Structural Variant Detection
Copy Number Variation
2. RNA-Seq Cancer Transcriptomics
Used for tumor expression profiling.
- alexdobin/STAR
- DaehwanKimLab/hisat2
- COMBINE-lab/salmon
- pachyderm/kallisto
- biocorecrg/DESeq2
- OliverVoogd/edgeR
- limma-dev/limma
- nf-core/rnaseq
- bcbio/bcbio-nextgen
- wanglab/ballgown
Challenge: Which RNA-seq aligner — STAR or HISAT2 — would you choose for a 10,000-sample tumor cohort, and why? Consider both accuracy and computational cost.
3. Single-Cell Cancer Biology
A huge frontier in tumor heterogeneity research.
- satijalab/seurat
- scverse/scanpy
- cole-trapnell-lab/monocle3
- broadinstitute/infercnv
- velocyto-team/velocyto
- harmony-lab/harmony
- Teichlab/cellphonedb
- YosefLab/scVI
- kstreet13/slingshot
- liulab-dfci/CytoTRACE
4. Tumor Microenvironment / Immuno-Oncology
Core tools for immune infiltration analysis.
- cibersortx/cibersortx
- LiLabAtVT/TIMER
- jalvesaq/xCell
- DanaherLab/EPIC
- ImmuneCellAI/ImmuneCellAI
- ImmunoEngine/ImmunoEngine
- BGI-DEV/ImmuneDeconv
- bioconductor/ImmuCC
- immunogenomics/liger
- ImmunoGenomics/IOBR
5. Cancer Multi-Omics Integration
Combining genomics + transcriptomics + proteomics.
- cbioportal/cbioportal
- bioinformaticsfmrp/maftools
- kimlaborg/iClusterPlus
- bioconductor/MultiAssayExperiment
- omicX/OmicIntegrator
- netZoo/netZooR
- MOMA-AI/moma
- Bioconductor/MOFA2
- mixOmicsTeam/mixOmics
- CMSCNV/CMSCNV
6. Cancer Pathway & Network Analysis
- cytoscape/cytoscape
- Bioconductor/ReactomePA
- bioc/clusterProfiler
- PathwayCommons/pc2
- STRING-db/stringApp
- wikipathways/wikipathways
- ndexbio/ndex
- pantherdb/pantherdb
- IPAanalysis/IPAtools
- gsea-msigdb/gsea
7. AI & Deep Learning for Cancer
A rapidly growing frontier.
- DeepChem/deepchem
- microsoft/BioGPT
- facebookresearch/esm
- google-deepmind/alphafold
- ProteinNet/ProteinNet
- drugai/DTA
- DeepPurpose/DeepPurpose
- TencentAILabHealthcare/Drug-Target-Interaction
- Chemprop/chemprop
- NVIDIA/DeepLearningExamples
8. Cancer Imaging & Radiomics
- AIM-Harvard/pyradiomics
- Project-MONAI/MONAI
- MIC-DKFZ/nnUNet
- NiftyNet/NiftyNet
- DeepRadiology/DeepRadiology
- QTIM-Lab/DeepNeuro
- ImagingGenomics/ImagingGenomics
- medical-imaging-network/MIDeepSeg
- VoxelMorph/VoxelMorph
- TorchIO-project/torchio
9. Drug Discovery & Precision Oncology
- open-targets/platform
- chembl/chembl_webresource_client
- drugbank/drugbank
- RDKit/rdkit
- OpenChem/OpenChem
- MoleculeNet/MoleculeNet
- DeepDrug3D/DeepDrug3D
- BioSolveIT/FlexX
- docking-org/zinc
- pharmgkb/pharmgkb
10. Clinical Bioinformatics & Translational Tools
- OHDSI/ATLAS
- i2b2/i2b2-core-server
- tranSMART/tranSMART
- clinical-genomics/clinical-genomics
- FHIR/fhir
- REDCap/redcap
- cBioPortal/cbioportal-frontend
- genomic-cancer/GENIE
- SEERstat/seerstat
- TCGA-Assembler/TCGA-Assembler2
11. Microbiome & Cancer Research
- qiime2/qiime2
- biobakery/metaphlan
- biobakery/humann
- kraken2/kraken2
- mothur/mothur
- MetaBAT/MetaBAT
- anvi’o/anvio
- MGnify/mgnify
- DADA2/dada2
- Phyloseq/phyloseq
12. Epigenomics & Cancer Regulation
- deepTools/deepTools
- bismark/Bismark
- MACS3/MACS
- HOMER/HOMER
- chromVAR/chromVAR
- ChIPseeker/ChIPseeker
- ATACseqQC/ATACseqQC
- methylKit/methylKit
- RnBeads/RnBeads
- eFORGE/eFORGE
Challenge: Epigenomics data is inherently more noisy than genomic data. Which two tools from this section would you combine to build a high-confidence regulatory map for a set of tumor samples?
13. Proteomics in Cancer
- maxquant/maxquant
- OpenMS/OpenMS
- Skyline/Skyline
- MSFragger/MSFragger
- FragPipe/FragPipe
- ProteoWizard/proteowizard
- Perseus/Perseus
- DIA-NN/DIA-NN
- pFind/pFind
- Comet/comet
14. Text Mining Cancer Literature
- bioc/BioBERT
- scispacy/scispacy
- PubTator/PubTator
- EuropePMC/europepmc
- BELMiner/BELMiner
- SemRep/SemRep
- DeepDive/DeepDive
- LitVar/LitVar
- BioWordVec/BioWordVec
- BioNLP/BioNLP
15. Data Science Frameworks Used in Cancer Research
- numpy/numpy
- pandas-dev/pandas
- scikit-learn/scikit-learn
- pytorch/pytorch
- tensorflow/tensorflow
- rapidsai/rapids
- dask/dask
- ray-project/ray
- seaborn/seaborn
- matplotlib/matplotlib
16. Reproducible Research & Pipelines
- nextflow-io/nextflow
- snakemake/snakemake
- nf-core/nf-core
- CWL/cwltool
- dockstore/dockstore
- Terra/terra
- WDL/wdl
- airflow/airflow
- prefecthq/prefect
- Pachyderm/pachyderm
17. Public Cancer Data Access Tools
- gdc-client/gdc-client
- TCGAbiolinks/TCGAbiolinks
- recount3/recount3
- UCSCXena/Xena
- GEOquery/GEOquery
- BioMart/BioMart
- Ensembl/ensembl
- BioPython/biopython
- PyEnsembl/pyensembl
- BioJulia/BioJulia
18. Experimental Biology Automation
- opentrons/opentrons
- OpenLabware/OpenLabware
- PyLabRobot/PyLabRobot
- LabAutomation/LabAutomation
- Benchling/benchling-api
- Aquarium/aquarium
- Autoprotocol/autoprotocol
- Antha/antha
- OpentronsProtocolLibrary
- LabThings/labthings
19. Knowledge Graphs for Cancer Research
- Hetionet/hetionet
- BioKG/BioKG
- MonarchInitiative/monarch-app
- ROBOKOP/ROBOKOP
- SPOKE/SPOKE
- RTX/RTX
- Bio2RDF/Bio2RDF
- INDRA/INDRA
- Neo4j-Genomics/Neo4jGenomics
- KnowledgeGraph-Bio/KG-Bio
The Modern Computational Oncology Stack
In real frontier cancer research, over 90% of pipelines combine:
Nextflow/Snakemake pipelines
+ GATK / RNA-seq tools
+ Seurat single-cell analysis
+ Immune deconvolution tools
+ AI drug discovery frameworks
This ecosystem forms the modern computational oncology stack. If you’re entering cancer bioinformatics, the fastest path to productivity is mastering one tool from each layer of this stack — not trying to learn all 200 at once.
Final challenge: Design a minimal cancer research pipeline for a new lab studying triple-negative breast cancer. Choose exactly five tools from the list above — one for variant calling, one for expression, one for single-cell, one for immune profiling, and one for pathway analysis. Justify each choice in one sentence. Your pipeline should be reproducible, computationally efficient, and produce results publishable in a high-impact journal.