File size: 4,509 Bytes
949080e
 
9a93226
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
949080e
 
 
 
 
9a93226
949080e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
metrics:
- accuracy
- bleu
- bleurt
---

# Cancer@Home v2

A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.

## πŸš€ Quick Start (5 minutes)

### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum

### Installation

1. **Clone and setup**
```bash
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt
```

2. **Start Neo4j Database**
```bash
docker-compose up -d
```

3. **Run the application**
```bash
python run.py
```

4. **Open your browser**
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)

## 🎯 Features

### 1. **Distributed Computing (BOINC Integration)**
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking

### 2. **GDC Data Integration**
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization

### 3. **Sequence Analysis Pipeline**
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation

### 4. **Neo4j Graph Database**
- Graph-based cancer data modeling
- Relationships: Gene β†’ Mutation β†’ Patient β†’ Cancer Type
- Interactive graph visualization

### 5. **GraphQL API**
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics

### 6. **Interactive Dashboard**
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis

## πŸ“Š Architecture

```
Cancer@Home v2
β”‚
β”œβ”€β”€ Frontend (React + D3.js)
β”‚   β”œβ”€β”€ Dashboard
β”‚   β”œβ”€β”€ Neo4j Visualization
β”‚   └── Task Monitor
β”‚
β”œβ”€β”€ Backend (FastAPI)
β”‚   β”œβ”€β”€ REST API
β”‚   β”œβ”€β”€ GraphQL Endpoint
β”‚   └── WebSocket (real-time updates)
β”‚
β”œβ”€β”€ Data Layer
β”‚   β”œβ”€β”€ Neo4j (Graph Database)
β”‚   β”œβ”€β”€ BOINC Client
β”‚   └── GDC API Client
β”‚
└── Analysis Pipeline
    β”œβ”€β”€ FASTQ Parser
    β”œβ”€β”€ BLAST Wrapper
    └── Variant Annotator
```

## πŸ—‚οΈ Project Structure

```
CancerAtHome2/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ api/              # FastAPI routes
β”‚   β”œβ”€β”€ boinc/            # BOINC integration
β”‚   β”œβ”€β”€ gdc/              # GDC data fetcher
β”‚   β”œβ”€β”€ neo4j/            # Neo4j database layer
β”‚   β”œβ”€β”€ pipeline/         # Bioinformatics pipeline
β”‚   └── graphql/          # GraphQL schema
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ public/
β”‚   └── src/
β”‚       β”œβ”€β”€ components/   # React components
β”‚       β”œβ”€β”€ views/        # Page views
β”‚       └── api/          # API client
β”œβ”€β”€ data/                 # Downloaded datasets
β”œβ”€β”€ docker-compose.yml    # Neo4j container
β”œβ”€β”€ requirements.txt      # Python dependencies
└── run.py               # Main entry point
```

## 🧬 Data Flow

1. **Data Ingestion**: Download cancer genomics data from GDC Portal
2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
3. **Storage**: Store results in Neo4j graph database
4. **Visualization**: Query and visualize via web dashboard

## πŸ”§ Configuration

Edit `config.yml` to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options

## πŸ“– Usage Examples

### Query Mutations by Gene
```graphql
query {
  mutations(gene: "TP53") {
    id
    position
    consequence
    patients {
      cancerType
      stage
    }
  }
}
```

### Submit Analysis Task
```python
from backend.boinc import BOINCClient

client = BOINCClient()
task_id = client.submit_task(
    workunit_type="variant_calling",
    input_file="sample.fastq"
)
```

## 🀝 Inspired By

- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
- [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling

## πŸ“„ License

MIT License

## πŸ›Ÿ Support

For issues or questions, please open a Huggingface or GitHub issue.