File size: 10,835 Bytes
087e68e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
---
# Cancer@Home v2
<div align="center">
<img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version">
<img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
<img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
<img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j">
</div>
## ๐งฌ Overview
Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system.
Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual.
## ๐ฏ Key Features
- ๐ **Interactive Web Dashboard** - Modern UI with real-time visualizations
- ๐ **Neo4j Graph Database** - Model complex gene-mutation-patient relationships
- โก **BOINC Integration** - Distributed computing for intensive analyses
- ๐ **GraphQL API** - Flexible data querying
- ๐งช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling
- ๐ **GDC Portal Integration** - Access TCGA/TARGET cancer datasets
- ๐ **Quick Setup** - Running in under 5 minutes
## ๐๏ธ Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Web Dashboard (D3.js + Chart.js) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ FastAPI Backend (REST + GraphQL) โ
โโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโค
โNeo4j โBOINC โ GDC โFASTQ โ BLAST/Variant โ
โGraph โClientโ API โ QC โ Calling โ
โโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโ
```
## ๐ฆ Installation
### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM (16GB recommended)
### Quick Start
**Windows:**
```powershell
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py
```
**Linux/Mac:**
```bash
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py
```
Then open: **http://localhost:5000**
## ๐ Usage
### Web Dashboard
Access the interactive dashboard at http://localhost:5000 with:
- **Dashboard Tab**: Overview statistics and mutation charts
- **Neo4j Visualization**: Interactive graph of cancer relationships
- **BOINC Tasks**: Submit and monitor distributed computing tasks
- **GDC Data**: Browse and download cancer datasets
- **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling
### GraphQL API
Query cancer data at http://localhost:5000/graphql
**Example: Get mutations in TP53 gene**
```graphql
query {
mutations(gene: "TP53") {
mutation_id
chromosome
position
consequence
}
}
```
**Example: Get patient statistics**
```graphql
query {
cancerStatistics(cancer_type_id: "BRCA") {
total_patients
total_mutations
avg_mutations_per_patient
}
}
```
### REST API
**Database Summary:**
```bash
curl http://localhost:5000/api/neo4j/summary
```
**Submit BOINC Task:**
```bash
curl -X POST http://localhost:5000/api/boinc/submit \
-H "Content-Type: application/json" \
-d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
```
### Python API
**FASTQ Processing:**
```python
from backend.pipeline import FASTQProcessor
processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")
```
**Variant Calling:**
```python
from backend.pipeline import VariantCaller, VariantAnalyzer
caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)
analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)
```
**Neo4j Queries:**
```python
from backend.neo4j import DatabaseManager
db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()
```
## ๐ Data Model
### Neo4j Graph Schema
**Nodes:**
- **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.)
- **Mutation**: Genetic variants with position and consequence
- **Patient**: Individual cases with demographics
- **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM)
**Relationships:**
- `Gene โ AFFECTS โ Mutation`
- `Patient โ HAS_MUTATION โ Mutation`
- `Patient โ DIAGNOSED_WITH โ CancerType`
### Sample Data Included
- **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
- **5 Mutations**: Cancer-associated variants
- **5 Patients**: Representative TCGA cases
- **4 Cancer Types**: BRCA, LUAD, COAD, GBM
## ๐ง Technology Stack
- **Backend**: FastAPI, Python 3.8+
- **Database**: Neo4j 5.13 (Graph Database)
- **API**: GraphQL (Strawberry), REST
- **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js
- **Bioinformatics**: Biopython, BLAST+
- **Data Source**: GDC Portal API (TCGA/TARGET)
- **Infrastructure**: Docker, Docker Compose
- **Distributed Computing**: BOINC Framework
## ๐ Documentation
- [README.md](README.md) - Complete project overview
- [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide
- [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation
- [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview
## ๐ Use Cases
1. **Cancer Research**: Analyze genomics data with distributed computing
2. **Education**: Learn cancer genetics and bioinformatics
3. **Data Visualization**: Explore gene-mutation-patient relationships
4. **Pipeline Development**: Test bioinformatics workflows
5. **Graph Analytics**: Query complex biological networks
## ๐ฌ Supported Cancer Projects
- **TCGA-BRCA**: Breast Cancer (1,098 cases)
- **TCGA-LUAD**: Lung Adenocarcinoma (585 cases)
- **TCGA-COAD**: Colon Adenocarcinoma (461 cases)
- **TCGA-GBM**: Glioblastoma (617 cases)
- **TARGET-AML**: Acute Myeloid Leukemia (238 cases)
## ๐ Bioinformatics Pipeline
### FASTQ Processing
- Quality control and filtering
- Adapter trimming
- Statistics calculation
- QC report generation
### BLAST Alignment
- BLASTN for nucleotide sequences
- BLASTP for protein sequences
- Hit filtering by identity/e-value
- Homology detection
### Variant Calling
- VCF generation from alignments
- Quality filtering
- Cancer variant identification
- Tumor mutation burden (TMB) calculation
## ๐ Access Points
- **Application**: http://localhost:5000
- **API Docs**: http://localhost:5000/docs (Swagger UI)
- **GraphQL**: http://localhost:5000/graphql
- **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123)
## ๐ ๏ธ Configuration
Edit `config.yml` to customize:
```yaml
neo4j:
uri: "bolt://localhost:7687"
password: "cancer123"
gdc:
download_dir: "./data/gdc"
projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]
pipeline:
fastq:
quality_threshold: 20
min_length: 50
blast:
evalue: 0.001
num_threads: 4
```
## ๐ค Contributing
Contributions are welcome! This project is open source under the MIT License.
### Development Setup
```bash
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py
```
## ๐ License
MIT License - See [LICENSE](LICENSE) file
Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal
## ๐ Acknowledgments
### Inspiration
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge
- [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4)
### Data Sources
- [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/)
- The Cancer Genome Atlas (TCGA) Program
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)
### Technologies
- Neo4j Graph Database
- BOINC Distributed Computing Project
- Biopython Community
- FastAPI Framework
## ๐ฅ Authors
- **OpenPeer AI** - Core development and architecture
- **Riemann Computing Inc.** - Distributed computing integration
- **Bleunomics** - Bioinformatics pipeline and genomics expertise
- **Andrew Magdy Kamal** - Graph database design and visualization
## ๐ Support
- **Documentation**: See project documentation files
- **Issues**: Check logs in `logs/cancer_at_home.log`
- **Configuration**: Review `config.yml`
- **Health Check**: http://localhost:5000/api/health
## ๐ฎ Roadmap
### Planned Features
- Machine learning for mutation prediction
- Multi-omics data integration (RNA-seq, proteomics)
- Survival analysis and clinical outcomes
- Advanced graph algorithms (PageRank, community detection)
- Cloud deployment support (AWS, Azure, GCP)
- Mobile-responsive design
- User authentication and authorization
## ๐ Statistics
- **Lines of Code**: ~5,000+
- **Modules**: 9 Python modules
- **API Endpoints**: 15+ REST + GraphQL
- **Documentation**: 2,500+ lines
- **Setup Time**: < 5 minutes
- **Sample Data**: 7 genes, 5 mutations, 5 patients
## ๐ฏ Citation
If you use Cancer@Home v2 in your research, please cite:
```bibtex
@software{cancer_at_home_v2,
title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
year = {2025},
url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
license = {MIT}
}
```
## ๐ท๏ธ Tags
`cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology`
---
**Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal**
**For cancer research, by researchers, accessible to all.**
|