File size: 4,509 Bytes
949080e 9a93226 949080e 9a93226 949080e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
metrics:
- accuracy
- bleu
- bleurt
---
# Cancer@Home v2
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
## π Quick Start (5 minutes)
### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum
### Installation
1. **Clone and setup**
```bash
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
2. **Start Neo4j Database**
```bash
docker-compose up -d
```
3. **Run the application**
```bash
python run.py
```
4. **Open your browser**
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
## π― Features
### 1. **Distributed Computing (BOINC Integration)**
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking
### 2. **GDC Data Integration**
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization
### 3. **Sequence Analysis Pipeline**
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation
### 4. **Neo4j Graph Database**
- Graph-based cancer data modeling
- Relationships: Gene β Mutation β Patient β Cancer Type
- Interactive graph visualization
### 5. **GraphQL API**
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics
### 6. **Interactive Dashboard**
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis
## π Architecture
```
Cancer@Home v2
β
βββ Frontend (React + D3.js)
β βββ Dashboard
β βββ Neo4j Visualization
β βββ Task Monitor
β
βββ Backend (FastAPI)
β βββ REST API
β βββ GraphQL Endpoint
β βββ WebSocket (real-time updates)
β
βββ Data Layer
β βββ Neo4j (Graph Database)
β βββ BOINC Client
β βββ GDC API Client
β
βββ Analysis Pipeline
βββ FASTQ Parser
βββ BLAST Wrapper
βββ Variant Annotator
```
## ποΈ Project Structure
```
CancerAtHome2/
βββ backend/
β βββ api/ # FastAPI routes
β βββ boinc/ # BOINC integration
β βββ gdc/ # GDC data fetcher
β βββ neo4j/ # Neo4j database layer
β βββ pipeline/ # Bioinformatics pipeline
β βββ graphql/ # GraphQL schema
βββ frontend/
β βββ public/
β βββ src/
β βββ components/ # React components
β βββ views/ # Page views
β βββ api/ # API client
βββ data/ # Downloaded datasets
βββ docker-compose.yml # Neo4j container
βββ requirements.txt # Python dependencies
βββ run.py # Main entry point
```
## 𧬠Data Flow
1. **Data Ingestion**: Download cancer genomics data from GDC Portal
2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
3. **Storage**: Store results in Neo4j graph database
4. **Visualization**: Query and visualize via web dashboard
## π§ Configuration
Edit `config.yml` to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options
## π Usage Examples
### Query Mutations by Gene
```graphql
query {
mutations(gene: "TP53") {
id
position
consequence
patients {
cancerType
stage
}
}
}
```
### Submit Analysis Task
```python
from backend.boinc import BOINCClient
client = BOINCClient()
task_id = client.submit_task(
workunit_type="variant_calling",
input_file="sample.fastq"
)
```
## π€ Inspired By
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
- [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling
## π License
MIT License
## π Support
For issues or questions, please open a Huggingface or GitHub issue. |