File size: 10,835 Bytes
087e68e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
---

# Cancer@Home v2

<div align="center">
  <img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version">
  <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License">
  <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python">
  <img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j">
</div>

## ๐Ÿงฌ Overview

Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system.

Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual.

## ๐ŸŽฏ Key Features

- ๐ŸŒ **Interactive Web Dashboard** - Modern UI with real-time visualizations
- ๐Ÿ” **Neo4j Graph Database** - Model complex gene-mutation-patient relationships
- โšก **BOINC Integration** - Distributed computing for intensive analyses
- ๐Ÿ“Š **GraphQL API** - Flexible data querying
- ๐Ÿงช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling
- ๐Ÿ“š **GDC Portal Integration** - Access TCGA/TARGET cancer datasets
- ๐Ÿš€ **Quick Setup** - Running in under 5 minutes

## ๐Ÿ—๏ธ Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Web Dashboard (D3.js + Chart.js)        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     FastAPI Backend (REST + GraphQL)        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚Neo4j โ”‚BOINC โ”‚ GDC  โ”‚FASTQ โ”‚ BLAST/Variant  โ”‚
โ”‚Graph โ”‚Clientโ”‚ API  โ”‚  QC  โ”‚    Calling     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

## ๐Ÿ“ฆ Installation

### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM (16GB recommended)

### Quick Start

**Windows:**
```powershell
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
.\setup.ps1
python run.py
```

**Linux/Mac:**
```bash
git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2
cd CancerAtHomeV2
chmod +x setup.sh
./setup.sh
python run.py
```

Then open: **http://localhost:5000**

## ๐Ÿš€ Usage

### Web Dashboard
Access the interactive dashboard at http://localhost:5000 with:
- **Dashboard Tab**: Overview statistics and mutation charts
- **Neo4j Visualization**: Interactive graph of cancer relationships
- **BOINC Tasks**: Submit and monitor distributed computing tasks
- **GDC Data**: Browse and download cancer datasets
- **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling

### GraphQL API

Query cancer data at http://localhost:5000/graphql

**Example: Get mutations in TP53 gene**
```graphql
query {
  mutations(gene: "TP53") {
    mutation_id
    chromosome
    position
    consequence
  }
}
```

**Example: Get patient statistics**
```graphql
query {
  cancerStatistics(cancer_type_id: "BRCA") {
    total_patients
    total_mutations
    avg_mutations_per_patient
  }
}
```

### REST API

**Database Summary:**
```bash
curl http://localhost:5000/api/neo4j/summary
```

**Submit BOINC Task:**
```bash
curl -X POST http://localhost:5000/api/boinc/submit \
  -H "Content-Type: application/json" \
  -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}'
```

### Python API

**FASTQ Processing:**
```python
from backend.pipeline import FASTQProcessor

processor = FASTQProcessor()
stats = processor.calculate_statistics("input.fastq")
filtered = processor.quality_filter("input.fastq")
```

**Variant Calling:**
```python
from backend.pipeline import VariantCaller, VariantAnalyzer

caller = VariantCaller()
vcf_file = caller.call_variants("alignment.bam", "reference.fa")
variants = caller.filter_variants(vcf_file)

analyzer = VariantAnalyzer()
cancer_variants = analyzer.identify_cancer_variants(variants)
tmb = analyzer.calculate_mutation_burden(variants)
```

**Neo4j Queries:**
```python
from backend.neo4j import DatabaseManager

db = DatabaseManager()
query = """
MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation)
RETURN m.position, m.consequence
"""
results = db.execute_query(query)
db.close()
```

## ๐Ÿ“Š Data Model

### Neo4j Graph Schema

**Nodes:**
- **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.)
- **Mutation**: Genetic variants with position and consequence
- **Patient**: Individual cases with demographics
- **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM)

**Relationships:**
- `Gene โ† AFFECTS โ† Mutation`
- `Patient โ†’ HAS_MUTATION โ†’ Mutation`
- `Patient โ†’ DIAGNOSED_WITH โ†’ CancerType`

### Sample Data Included

- **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR
- **5 Mutations**: Cancer-associated variants
- **5 Patients**: Representative TCGA cases
- **4 Cancer Types**: BRCA, LUAD, COAD, GBM

## ๐Ÿ”ง Technology Stack

- **Backend**: FastAPI, Python 3.8+
- **Database**: Neo4j 5.13 (Graph Database)
- **API**: GraphQL (Strawberry), REST
- **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js
- **Bioinformatics**: Biopython, BLAST+
- **Data Source**: GDC Portal API (TCGA/TARGET)
- **Infrastructure**: Docker, Docker Compose
- **Distributed Computing**: BOINC Framework

## ๐Ÿ“š Documentation

- [README.md](README.md) - Complete project overview
- [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide
- [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation
- [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview

## ๐ŸŽ“ Use Cases

1. **Cancer Research**: Analyze genomics data with distributed computing
2. **Education**: Learn cancer genetics and bioinformatics
3. **Data Visualization**: Explore gene-mutation-patient relationships
4. **Pipeline Development**: Test bioinformatics workflows
5. **Graph Analytics**: Query complex biological networks

## ๐Ÿ”ฌ Supported Cancer Projects

- **TCGA-BRCA**: Breast Cancer (1,098 cases)
- **TCGA-LUAD**: Lung Adenocarcinoma (585 cases)
- **TCGA-COAD**: Colon Adenocarcinoma (461 cases)
- **TCGA-GBM**: Glioblastoma (617 cases)
- **TARGET-AML**: Acute Myeloid Leukemia (238 cases)

## ๐Ÿ“ˆ Bioinformatics Pipeline

### FASTQ Processing
- Quality control and filtering
- Adapter trimming
- Statistics calculation
- QC report generation

### BLAST Alignment
- BLASTN for nucleotide sequences
- BLASTP for protein sequences
- Hit filtering by identity/e-value
- Homology detection

### Variant Calling
- VCF generation from alignments
- Quality filtering
- Cancer variant identification
- Tumor mutation burden (TMB) calculation

## ๐ŸŒ Access Points

- **Application**: http://localhost:5000
- **API Docs**: http://localhost:5000/docs (Swagger UI)
- **GraphQL**: http://localhost:5000/graphql
- **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123)

## ๐Ÿ› ๏ธ Configuration

Edit `config.yml` to customize:

```yaml
neo4j:
  uri: "bolt://localhost:7687"
  password: "cancer123"

gdc:
  download_dir: "./data/gdc"
  projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"]

pipeline:
  fastq:
    quality_threshold: 20
    min_length: 50
  blast:
    evalue: 0.001
    num_threads: 4
```

## ๐Ÿค Contributing

Contributions are welcome! This project is open source under the MIT License.

### Development Setup
```bash
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt
pytest test_cancer_at_home.py
```

## ๐Ÿ“„ License

MIT License - See [LICENSE](LICENSE) file

Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal

## ๐Ÿ™ Acknowledgments

### Inspiration
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge
- [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4)

### Data Sources
- [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/)
- The Cancer Genome Atlas (TCGA) Program
- Therapeutically Applicable Research to Generate Effective Treatments (TARGET)

### Technologies
- Neo4j Graph Database
- BOINC Distributed Computing Project
- Biopython Community
- FastAPI Framework

## ๐Ÿ‘ฅ Authors

- **OpenPeer AI** - Core development and architecture
- **Riemann Computing Inc.** - Distributed computing integration
- **Bleunomics** - Bioinformatics pipeline and genomics expertise
- **Andrew Magdy Kamal** - Graph database design and visualization

## ๐Ÿ“ž Support

- **Documentation**: See project documentation files
- **Issues**: Check logs in `logs/cancer_at_home.log`
- **Configuration**: Review `config.yml`
- **Health Check**: http://localhost:5000/api/health

## ๐Ÿ”ฎ Roadmap

### Planned Features
- Machine learning for mutation prediction
- Multi-omics data integration (RNA-seq, proteomics)
- Survival analysis and clinical outcomes
- Advanced graph algorithms (PageRank, community detection)
- Cloud deployment support (AWS, Azure, GCP)
- Mobile-responsive design
- User authentication and authorization

## ๐Ÿ“Š Statistics

- **Lines of Code**: ~5,000+
- **Modules**: 9 Python modules
- **API Endpoints**: 15+ REST + GraphQL
- **Documentation**: 2,500+ lines
- **Setup Time**: < 5 minutes
- **Sample Data**: 7 genes, 5 mutations, 5 patients

## ๐ŸŽฏ Citation

If you use Cancer@Home v2 in your research, please cite:

```bibtex
@software{cancer_at_home_v2,
  title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform},
  author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal},
  year = {2025},
  url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2},
  license = {MIT}
}
```

## ๐Ÿท๏ธ Tags

`cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology`

---

**Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal**

**For cancer research, by researchers, accessible to all.**