Cancer@Home v2
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
π Quick Start (5 minutes)
Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum
Installation
- Clone and setup
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
- Start Neo4j Database
docker-compose up -d
- Run the application
python run.py
- Open your browser
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
π― Features
1. Distributed Computing (BOINC Integration)
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking
2. GDC Data Integration
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization
3. Sequence Analysis Pipeline
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation
4. Neo4j Graph Database
- Graph-based cancer data modeling
- Relationships: Gene β Mutation β Patient β Cancer Type
- Interactive graph visualization
5. GraphQL API
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics
6. Interactive Dashboard
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis
π Architecture
Cancer@Home v2
β
βββ Frontend (React + D3.js)
β βββ Dashboard
β βββ Neo4j Visualization
β βββ Task Monitor
β
βββ Backend (FastAPI)
β βββ REST API
β βββ GraphQL Endpoint
β βββ WebSocket (real-time updates)
β
βββ Data Layer
β βββ Neo4j (Graph Database)
β βββ BOINC Client
β βββ GDC API Client
β
βββ Analysis Pipeline
βββ FASTQ Parser
βββ BLAST Wrapper
βββ Variant Annotator
ποΈ Project Structure
CancerAtHome2/
βββ backend/
β βββ api/ # FastAPI routes
β βββ boinc/ # BOINC integration
β βββ gdc/ # GDC data fetcher
β βββ neo4j/ # Neo4j database layer
β βββ pipeline/ # Bioinformatics pipeline
β βββ graphql/ # GraphQL schema
βββ frontend/
β βββ public/
β βββ src/
β βββ components/ # React components
β βββ views/ # Page views
β βββ api/ # API client
βββ data/ # Downloaded datasets
βββ docker-compose.yml # Neo4j container
βββ requirements.txt # Python dependencies
βββ run.py # Main entry point
𧬠Data Flow
- Data Ingestion: Download cancer genomics data from GDC Portal
- Processing: Run FASTQ/BLAST analysis on distributed BOINC network
- Storage: Store results in Neo4j graph database
- Visualization: Query and visualize via web dashboard
π§ Configuration
Edit config.yml to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options
π Usage Examples
Query Mutations by Gene
query {
mutations(gene: "TP53") {
id
position
consequence
patients {
cancerType
stage
}
}
}
Submit Analysis Task
from backend.boinc import BOINCClient
client = BOINCClient()
task_id = client.submit_task(
workunit_type="variant_calling",
input_file="sample.fastq"
)
π€ Inspired By
- Cancer@Home v1 - Distributed cancer research
- Neo4j Cancer Visualization - Graph-based cancer data modeling
π License
MIT License
π Support
For issues or questions, please open a Huggingface or GitHub issue.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support