Cancer@Home v2

A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.

πŸš€ Quick Start (5 minutes)

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • 8GB RAM minimum

Installation

  1. Clone and setup
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt
  1. Start Neo4j Database
docker-compose up -d
  1. Run the application
python run.py
  1. Open your browser

🎯 Features

1. Distributed Computing (BOINC Integration)

  • Submit cancer research computational tasks
  • Monitor distributed workload processing
  • Real-time task status tracking

2. GDC Data Integration

  • Download cancer genomics data from GDC Portal
  • Support for various cancer types (TCGA, TARGET projects)
  • Automatic data parsing and normalization

3. Sequence Analysis Pipeline

  • FASTQ file processing
  • BLAST sequence alignment
  • Variant calling and annotation

4. Neo4j Graph Database

  • Graph-based cancer data modeling
  • Relationships: Gene β†’ Mutation β†’ Patient β†’ Cancer Type
  • Interactive graph visualization

5. GraphQL API

  • Query cancer data flexibly
  • Filter by gene, mutation, patient cohort
  • Aggregate statistics

6. Interactive Dashboard

  • Real-time data visualization
  • Network graphs for gene interactions
  • Mutation frequency charts
  • Patient cohort analysis

πŸ“Š Architecture

Cancer@Home v2
β”‚
β”œβ”€β”€ Frontend (React + D3.js)
β”‚   β”œβ”€β”€ Dashboard
β”‚   β”œβ”€β”€ Neo4j Visualization
β”‚   └── Task Monitor
β”‚
β”œβ”€β”€ Backend (FastAPI)
β”‚   β”œβ”€β”€ REST API
β”‚   β”œβ”€β”€ GraphQL Endpoint
β”‚   └── WebSocket (real-time updates)
β”‚
β”œβ”€β”€ Data Layer
β”‚   β”œβ”€β”€ Neo4j (Graph Database)
β”‚   β”œβ”€β”€ BOINC Client
β”‚   └── GDC API Client
β”‚
└── Analysis Pipeline
    β”œβ”€β”€ FASTQ Parser
    β”œβ”€β”€ BLAST Wrapper
    └── Variant Annotator

πŸ—‚οΈ Project Structure

CancerAtHome2/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ api/              # FastAPI routes
β”‚   β”œβ”€β”€ boinc/            # BOINC integration
β”‚   β”œβ”€β”€ gdc/              # GDC data fetcher
β”‚   β”œβ”€β”€ neo4j/            # Neo4j database layer
β”‚   β”œβ”€β”€ pipeline/         # Bioinformatics pipeline
β”‚   └── graphql/          # GraphQL schema
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ public/
β”‚   └── src/
β”‚       β”œβ”€β”€ components/   # React components
β”‚       β”œβ”€β”€ views/        # Page views
β”‚       └── api/          # API client
β”œβ”€β”€ data/                 # Downloaded datasets
β”œβ”€β”€ docker-compose.yml    # Neo4j container
β”œβ”€β”€ requirements.txt      # Python dependencies
└── run.py               # Main entry point

🧬 Data Flow

  1. Data Ingestion: Download cancer genomics data from GDC Portal
  2. Processing: Run FASTQ/BLAST analysis on distributed BOINC network
  3. Storage: Store results in Neo4j graph database
  4. Visualization: Query and visualize via web dashboard

πŸ”§ Configuration

Edit config.yml to customize:

  • Neo4j connection settings
  • GDC API parameters
  • BOINC project URL
  • Analysis pipeline options

πŸ“– Usage Examples

Query Mutations by Gene

query {
  mutations(gene: "TP53") {
    id
    position
    consequence
    patients {
      cancerType
      stage
    }
  }
}

Submit Analysis Task

from backend.boinc import BOINCClient

client = BOINCClient()
task_id = client.submit_task(
    workunit_type="variant_calling",
    input_file="sample.fastq"
)

🀝 Inspired By

πŸ“„ License

MIT License

πŸ›Ÿ Support

For issues or questions, please open a Huggingface or GitHub issue.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support