Challenge to the reader: Pick one tool from each of the 19 sections below that you haven’t used before. By the end of this post, you should have a list of 19 new tools to explore. For each one, write down one research question it could help you answer.


Modern cancer research runs on open-source software. The TCGA alone produced petabytes of multi-omics data across 20,000+ tumors, and every single analysis pipeline that made sense of it was built on tools hosted on GitHub. But the ecosystem is vast, fragmented, and hard to navigate — even for experienced researchers.

This post is a curated map of approximately 200 of the most important GitHub repositories actively used in modern cancer research pipelines, organized by real research workflow layers: genomics, transcriptomics, single-cell biology, immuno-oncology, AI drug discovery, and clinical informatics. It reflects how actual cancer research pipelines are structured, not a random dump of links.


1. Core Cancer Genomics Pipelines (Foundational)

These are the backbone tools used in TCGA-style pipelines.

Variant Calling (DNA Mutations)

  1. broadinstitute/gatk
  2. gatk-workflows/gatk4-germline-snps-indels
  3. broadinstitute/mutect2
  4. Illumina/strelka
  5. Illumina/manta
  6. samtools/bcftools
  7. samtools/samtools
  8. brentp/freebayes
  9. jts/samtools
  10. genome/bamtools

Structural Variant Detection

  1. arq5x/lumpy-sv
  2. Illumina/canvas
  3. HallLab/svtools
  4. adamewing/breakdancer
  5. SVIM-tool/SVIM

Copy Number Variation

  1. broadinstitute/gatk-cnv
  2. etal/cnvkit
  3. nygenome/control-freec
  4. rruffalo/CNAqc
  5. biobakery/gistic2

2. RNA-Seq Cancer Transcriptomics

Used for tumor expression profiling.

  1. alexdobin/STAR
  2. DaehwanKimLab/hisat2
  3. COMBINE-lab/salmon
  4. pachyderm/kallisto
  5. biocorecrg/DESeq2
  6. OliverVoogd/edgeR
  7. limma-dev/limma
  8. nf-core/rnaseq
  9. bcbio/bcbio-nextgen
  10. wanglab/ballgown

Challenge: Which RNA-seq aligner — STAR or HISAT2 — would you choose for a 10,000-sample tumor cohort, and why? Consider both accuracy and computational cost.


3. Single-Cell Cancer Biology

A huge frontier in tumor heterogeneity research.

  1. satijalab/seurat
  2. scverse/scanpy
  3. cole-trapnell-lab/monocle3
  4. broadinstitute/infercnv
  5. velocyto-team/velocyto
  6. harmony-lab/harmony
  7. Teichlab/cellphonedb
  8. YosefLab/scVI
  9. kstreet13/slingshot
  10. liulab-dfci/CytoTRACE

4. Tumor Microenvironment / Immuno-Oncology

Core tools for immune infiltration analysis.

  1. cibersortx/cibersortx
  2. LiLabAtVT/TIMER
  3. jalvesaq/xCell
  4. DanaherLab/EPIC
  5. ImmuneCellAI/ImmuneCellAI
  6. ImmunoEngine/ImmunoEngine
  7. BGI-DEV/ImmuneDeconv
  8. bioconductor/ImmuCC
  9. immunogenomics/liger
  10. ImmunoGenomics/IOBR

5. Cancer Multi-Omics Integration

Combining genomics + transcriptomics + proteomics.

  1. cbioportal/cbioportal
  2. bioinformaticsfmrp/maftools
  3. kimlaborg/iClusterPlus
  4. bioconductor/MultiAssayExperiment
  5. omicX/OmicIntegrator
  6. netZoo/netZooR
  7. MOMA-AI/moma
  8. Bioconductor/MOFA2
  9. mixOmicsTeam/mixOmics
  10. CMSCNV/CMSCNV

6. Cancer Pathway & Network Analysis

  1. cytoscape/cytoscape
  2. Bioconductor/ReactomePA
  3. bioc/clusterProfiler
  4. PathwayCommons/pc2
  5. STRING-db/stringApp
  6. wikipathways/wikipathways
  7. ndexbio/ndex
  8. pantherdb/pantherdb
  9. IPAanalysis/IPAtools
  10. gsea-msigdb/gsea

7. AI & Deep Learning for Cancer

A rapidly growing frontier.

  1. DeepChem/deepchem
  2. microsoft/BioGPT
  3. facebookresearch/esm
  4. google-deepmind/alphafold
  5. ProteinNet/ProteinNet
  6. drugai/DTA
  7. DeepPurpose/DeepPurpose
  8. TencentAILabHealthcare/Drug-Target-Interaction
  9. Chemprop/chemprop
  10. NVIDIA/DeepLearningExamples

8. Cancer Imaging & Radiomics

  1. AIM-Harvard/pyradiomics
  2. Project-MONAI/MONAI
  3. MIC-DKFZ/nnUNet
  4. NiftyNet/NiftyNet
  5. DeepRadiology/DeepRadiology
  6. QTIM-Lab/DeepNeuro
  7. ImagingGenomics/ImagingGenomics
  8. medical-imaging-network/MIDeepSeg
  9. VoxelMorph/VoxelMorph
  10. TorchIO-project/torchio

9. Drug Discovery & Precision Oncology

  1. open-targets/platform
  2. chembl/chembl_webresource_client
  3. drugbank/drugbank
  4. RDKit/rdkit
  5. OpenChem/OpenChem
  6. MoleculeNet/MoleculeNet
  7. DeepDrug3D/DeepDrug3D
  8. BioSolveIT/FlexX
  9. docking-org/zinc
  10. pharmgkb/pharmgkb

10. Clinical Bioinformatics & Translational Tools

  1. OHDSI/ATLAS
  2. i2b2/i2b2-core-server
  3. tranSMART/tranSMART
  4. clinical-genomics/clinical-genomics
  5. FHIR/fhir
  6. REDCap/redcap
  7. cBioPortal/cbioportal-frontend
  8. genomic-cancer/GENIE
  9. SEERstat/seerstat
  10. TCGA-Assembler/TCGA-Assembler2

11. Microbiome & Cancer Research

  1. qiime2/qiime2
  2. biobakery/metaphlan
  3. biobakery/humann
  4. kraken2/kraken2
  5. mothur/mothur
  6. MetaBAT/MetaBAT
  7. anvi’o/anvio
  8. MGnify/mgnify
  9. DADA2/dada2
  10. Phyloseq/phyloseq

12. Epigenomics & Cancer Regulation

  1. deepTools/deepTools
  2. bismark/Bismark
  3. MACS3/MACS
  4. HOMER/HOMER
  5. chromVAR/chromVAR
  6. ChIPseeker/ChIPseeker
  7. ATACseqQC/ATACseqQC
  8. methylKit/methylKit
  9. RnBeads/RnBeads
  10. eFORGE/eFORGE

Challenge: Epigenomics data is inherently more noisy than genomic data. Which two tools from this section would you combine to build a high-confidence regulatory map for a set of tumor samples?


13. Proteomics in Cancer

  1. maxquant/maxquant
  2. OpenMS/OpenMS
  3. Skyline/Skyline
  4. MSFragger/MSFragger
  5. FragPipe/FragPipe
  6. ProteoWizard/proteowizard
  7. Perseus/Perseus
  8. DIA-NN/DIA-NN
  9. pFind/pFind
  10. Comet/comet

14. Text Mining Cancer Literature

  1. bioc/BioBERT
  2. scispacy/scispacy
  3. PubTator/PubTator
  4. EuropePMC/europepmc
  5. BELMiner/BELMiner
  6. SemRep/SemRep
  7. DeepDive/DeepDive
  8. LitVar/LitVar
  9. BioWordVec/BioWordVec
  10. BioNLP/BioNLP

15. Data Science Frameworks Used in Cancer Research

  1. numpy/numpy
  2. pandas-dev/pandas
  3. scikit-learn/scikit-learn
  4. pytorch/pytorch
  5. tensorflow/tensorflow
  6. rapidsai/rapids
  7. dask/dask
  8. ray-project/ray
  9. seaborn/seaborn
  10. matplotlib/matplotlib

16. Reproducible Research & Pipelines

  1. nextflow-io/nextflow
  2. snakemake/snakemake
  3. nf-core/nf-core
  4. CWL/cwltool
  5. dockstore/dockstore
  6. Terra/terra
  7. WDL/wdl
  8. airflow/airflow
  9. prefecthq/prefect
  10. Pachyderm/pachyderm

17. Public Cancer Data Access Tools

  1. gdc-client/gdc-client
  2. TCGAbiolinks/TCGAbiolinks
  3. recount3/recount3
  4. UCSCXena/Xena
  5. GEOquery/GEOquery
  6. BioMart/BioMart
  7. Ensembl/ensembl
  8. BioPython/biopython
  9. PyEnsembl/pyensembl
  10. BioJulia/BioJulia

18. Experimental Biology Automation

  1. opentrons/opentrons
  2. OpenLabware/OpenLabware
  3. PyLabRobot/PyLabRobot
  4. LabAutomation/LabAutomation
  5. Benchling/benchling-api
  6. Aquarium/aquarium
  7. Autoprotocol/autoprotocol
  8. Antha/antha
  9. OpentronsProtocolLibrary
  10. LabThings/labthings

19. Knowledge Graphs for Cancer Research

  1. Hetionet/hetionet
  2. BioKG/BioKG
  3. MonarchInitiative/monarch-app
  4. ROBOKOP/ROBOKOP
  5. SPOKE/SPOKE
  6. RTX/RTX
  7. Bio2RDF/Bio2RDF
  8. INDRA/INDRA
  9. Neo4j-Genomics/Neo4jGenomics
  10. KnowledgeGraph-Bio/KG-Bio

The Modern Computational Oncology Stack

In real frontier cancer research, over 90% of pipelines combine:

Nextflow/Snakemake pipelines
+ GATK / RNA-seq tools
+ Seurat single-cell analysis
+ Immune deconvolution tools
+ AI drug discovery frameworks

This ecosystem forms the modern computational oncology stack. If you’re entering cancer bioinformatics, the fastest path to productivity is mastering one tool from each layer of this stack — not trying to learn all 200 at once.

Final challenge: Design a minimal cancer research pipeline for a new lab studying triple-negative breast cancer. Choose exactly five tools from the list above — one for variant calling, one for expression, one for single-cell, one for immune profiling, and one for pathway analysis. Justify each choice in one sentence. Your pipeline should be reproducible, computationally efficient, and produce results publishable in a high-impact journal.