Spaces:
Sleeping
Sleeping
Commit
Β·
42f7194
1
Parent(s):
b9629f4
starting dec 29
Browse files- README.md +64 -9
- app_database_prep.py +5 -0
- retrieval_evaluation_results.json +44 -44
- species-organized/PestID Species - Organized.xlsx +3 -0
- species-organized/species_analysis.png +3 -0
- species-organized/species_analysis.py +513 -0
- species-organized/species_statistics.txt +121 -0
- species-organized/species_table.tex +151 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/data_level0.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/header.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/length.bin +1 -1
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/link_lists.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 +2 -2
README.md
CHANGED
|
@@ -9,7 +9,70 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
## Git LFS Troubleshooting Notes
|
| 15 |
|
|
@@ -44,11 +107,3 @@ This repository encountered several Git LFS issues during setup. Here's a summar
|
|
| 44 |
* Pushing branches with problematic LFS history to a fresh remote can fail. Starting the remote with a clean, history-free branch is a workaround.
|
| 45 |
* When adding LFS tracking for existing binary files via `.gitattributes`, ensure the commit correctly converts files to LFS pointers. `git add --renormalize .` after updating `.gitattributes` and *before* committing is often necessary.
|
| 46 |
* Double-check `.gitignore` if expected files or directories are missing after a `git add .`.
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
while running in claude code :
|
| 50 |
-
source ~/miniconda3/etc/profile.d/conda.sh && conda install -c conda-forge numpy
|
| 51 |
-
ate agthinker
|
| 52 |
-
|
| 53 |
-
run command like example: source ~/miniconda3/etc/profile.d/conda.sh && conda activate agllm-env1-updates-1 && β
|
| 54 |
-
β python whatebverscriptis.py
|
|
|
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
+
|
| 13 |
+
## PestIDBot - Quick Reference
|
| 14 |
+
|
| 15 |
+
### Environment
|
| 16 |
+
```bash
|
| 17 |
+
source ~/miniconda3/etc/profile.d/conda.sh && conda activate agllm-env1-updates-1
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
### Key Commands
|
| 21 |
+
| Task | Command |
|
| 22 |
+
|------|---------|
|
| 23 |
+
| Build DB | `python app_database_prep.py` |
|
| 24 |
+
| Run Eval | `python retrieval_evaluation.py` |
|
| 25 |
+
| Run App | `python app.py` |
|
| 26 |
+
| Deploy Dev | `git push space3 fresh-start:main` |
|
| 27 |
+
| Deploy Prod | `git push space2 fresh-start:main` |
|
| 28 |
+
|
| 29 |
+
### Git Remotes
|
| 30 |
+
- `space2` β `git@hf.co:spaces/arbabarshad/agllm2` (production)
|
| 31 |
+
- `space3` β `git@hf.co:spaces/arbabarshad/agllm2-dev` (dev)
|
| 32 |
+
|
| 33 |
+
### Project Structure
|
| 34 |
+
```
|
| 35 |
+
βββ app.py # Main Gradio app (deployed)
|
| 36 |
+
βββ app_database_prep.py # Builds ChromaDB from PDFs + Excel
|
| 37 |
+
βββ retrieval_evaluation.py # Runs 4-filter evaluation
|
| 38 |
+
βββ retrieval_evaluation_results.json # Eval metrics output
|
| 39 |
+
β
|
| 40 |
+
βββ agllm-data/
|
| 41 |
+
β βββ agllm-data-isu-field-insects-all-species/
|
| 42 |
+
β β βββ *.pdf # Insect IPM documents
|
| 43 |
+
β β βββ matched_species_results_v2.csv # Species metadata
|
| 44 |
+
β βββ agllm-data-isu-field-weeds-all-species/
|
| 45 |
+
β β βββ *.pdf # Weed IPM documents
|
| 46 |
+
β β βββ matched_species_results_v2.csv # Species metadata
|
| 47 |
+
β βββ PestID Species.xlsx # India & Africa data (sheets)
|
| 48 |
+
β
|
| 49 |
+
βββ vector-databases-deployed/
|
| 50 |
+
β βββ db5-agllm-data-isu-field-insects-all-species/ # ChromaDB output
|
| 51 |
+
β
|
| 52 |
+
βββ species-organized/ # Analysis scripts & outputs
|
| 53 |
+
β βββ species_analysis.py # Generates paper Figure 3
|
| 54 |
+
β βββ species_table.tex # LaTeX species table
|
| 55 |
+
β
|
| 56 |
+
βββ writing/ # Paper drafts
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
### Database Build Flow
|
| 60 |
+
1. PDFs loaded from `agllm-data/` (insects + weeds)
|
| 61 |
+
2. Metadata read from `matched_species_results_v2.csv` files
|
| 62 |
+
3. Excel sheets (India, Africa) processed from `PestID Species.xlsx`
|
| 63 |
+
4. Documents chunked (512 tokens, 10 overlap)
|
| 64 |
+
5. Tagged with `matched_specie_X` + `region` metadata
|
| 65 |
+
6. Stored in ChromaDB at `vector-databases-deployed/db5-*/`
|
| 66 |
+
|
| 67 |
+
### Evaluation Filters (retrieval_evaluation.py)
|
| 68 |
+
| Filter | P@5 | nDCG@5 |
|
| 69 |
+
|--------|-----|--------|
|
| 70 |
+
| No Filter | 0.82 | 0.72 |
|
| 71 |
+
| Species Only | 0.99 | 0.89 |
|
| 72 |
+
| Region Only | 0.83 | 0.73 |
|
| 73 |
+
| Species + Region | **1.00** | **0.90** |
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
|
| 77 |
## Git LFS Troubleshooting Notes
|
| 78 |
|
|
|
|
| 107 |
* Pushing branches with problematic LFS history to a fresh remote can fail. Starting the remote with a clean, history-free branch is a workaround.
|
| 108 |
* When adding LFS tracking for existing binary files via `.gitattributes`, ensure the commit correctly converts files to LFS pointers. `git add --renormalize .` after updating `.gitattributes` and *before* committing is often necessary.
|
| 109 |
* Double-check `.gitignore` if expected files or directories are missing after a `git add .`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app_database_prep.py
CHANGED
|
@@ -228,6 +228,11 @@ africa_splitted_documents = process_excel_sheet(
|
|
| 228 |
splitted_documents = pdf_splitted_documents + india_splitted_documents + africa_splitted_documents
|
| 229 |
|
| 230 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
# print(splitted_documents[0]) # Original print statement - commented out as we print chunks above
|
| 232 |
print("=== Combined Processing Done ===") # Adjusted print statement
|
| 233 |
print(f"Total documents after combining PDF, India, and Africa sources: {len(splitted_documents)}")
|
|
|
|
| 228 |
splitted_documents = pdf_splitted_documents + india_splitted_documents + africa_splitted_documents
|
| 229 |
|
| 230 |
|
| 231 |
+
print("pdf_splitted_documents", len(pdf_splitted_documents))
|
| 232 |
+
print("india_splitted_documents", len(india_splitted_documents))
|
| 233 |
+
print("africa_splitted_documents", len(africa_splitted_documents))
|
| 234 |
+
|
| 235 |
+
|
| 236 |
# print(splitted_documents[0]) # Original print statement - commented out as we print chunks above
|
| 237 |
print("=== Combined Processing Done ===") # Adjusted print statement
|
| 238 |
print(f"Total documents after combining PDF, India, and Africa sources: {len(splitted_documents)}")
|
retrieval_evaluation_results.json
CHANGED
|
@@ -1,45 +1,45 @@
|
|
| 1 |
{
|
| 2 |
"no_filter": {
|
| 3 |
"precision@1": {
|
| 4 |
-
"mean": 0.
|
| 5 |
-
"std": 0.
|
| 6 |
"count": 100
|
| 7 |
},
|
| 8 |
"precision@3": {
|
| 9 |
-
"mean": 0.
|
| 10 |
-
"std": 0.
|
| 11 |
"count": 100
|
| 12 |
},
|
| 13 |
"precision@5": {
|
| 14 |
-
"mean": 0.
|
| 15 |
-
"std": 0.
|
| 16 |
"count": 100
|
| 17 |
},
|
| 18 |
"ndcg@1": {
|
| 19 |
-
"mean": 0.
|
| 20 |
-
"std": 0.
|
| 21 |
"count": 100
|
| 22 |
},
|
| 23 |
"ndcg@3": {
|
| 24 |
-
"mean": 0.
|
| 25 |
-
"std": 0.
|
| 26 |
"count": 100
|
| 27 |
},
|
| 28 |
"ndcg@5": {
|
| 29 |
-
"mean": 0.
|
| 30 |
-
"std": 0.
|
| 31 |
"count": 100
|
| 32 |
}
|
| 33 |
},
|
| 34 |
"species_only": {
|
| 35 |
"precision@1": {
|
| 36 |
-
"mean": 0.
|
| 37 |
-
"std": 0.
|
| 38 |
"count": 100
|
| 39 |
},
|
| 40 |
"precision@3": {
|
| 41 |
-
"mean": 0.
|
| 42 |
-
"std": 0.
|
| 43 |
"count": 100
|
| 44 |
},
|
| 45 |
"precision@5": {
|
|
@@ -48,57 +48,57 @@
|
|
| 48 |
"count": 100
|
| 49 |
},
|
| 50 |
"ndcg@1": {
|
| 51 |
-
"mean": 0.
|
| 52 |
-
"std": 0.
|
| 53 |
"count": 100
|
| 54 |
},
|
| 55 |
"ndcg@3": {
|
| 56 |
-
"mean": 0.
|
| 57 |
-
"std": 0.
|
| 58 |
"count": 100
|
| 59 |
},
|
| 60 |
"ndcg@5": {
|
| 61 |
-
"mean": 0.
|
| 62 |
-
"std": 0.
|
| 63 |
"count": 100
|
| 64 |
}
|
| 65 |
},
|
| 66 |
"region_only": {
|
| 67 |
"precision@1": {
|
| 68 |
-
"mean": 0.
|
| 69 |
-
"std": 0.
|
| 70 |
"count": 100
|
| 71 |
},
|
| 72 |
"precision@3": {
|
| 73 |
-
"mean": 0.
|
| 74 |
-
"std": 0.
|
| 75 |
"count": 100
|
| 76 |
},
|
| 77 |
"precision@5": {
|
| 78 |
-
"mean": 0.
|
| 79 |
-
"std": 0.
|
| 80 |
"count": 100
|
| 81 |
},
|
| 82 |
"ndcg@1": {
|
| 83 |
-
"mean": 0.
|
| 84 |
-
"std": 0.
|
| 85 |
"count": 100
|
| 86 |
},
|
| 87 |
"ndcg@3": {
|
| 88 |
-
"mean": 0.
|
| 89 |
-
"std": 0.
|
| 90 |
"count": 100
|
| 91 |
},
|
| 92 |
"ndcg@5": {
|
| 93 |
-
"mean": 0.
|
| 94 |
-
"std": 0.
|
| 95 |
"count": 100
|
| 96 |
}
|
| 97 |
},
|
| 98 |
"species_and_region": {
|
| 99 |
"precision@1": {
|
| 100 |
-
"mean": 0.
|
| 101 |
-
"std": 0.
|
| 102 |
"count": 100
|
| 103 |
},
|
| 104 |
"precision@3": {
|
|
@@ -107,23 +107,23 @@
|
|
| 107 |
"count": 100
|
| 108 |
},
|
| 109 |
"precision@5": {
|
| 110 |
-
"mean": 0
|
| 111 |
-
"std": 0.
|
| 112 |
"count": 100
|
| 113 |
},
|
| 114 |
"ndcg@1": {
|
| 115 |
-
"mean": 0.
|
| 116 |
-
"std": 0.
|
| 117 |
"count": 100
|
| 118 |
},
|
| 119 |
"ndcg@3": {
|
| 120 |
-
"mean": 0.
|
| 121 |
-
"std": 0.
|
| 122 |
"count": 100
|
| 123 |
},
|
| 124 |
"ndcg@5": {
|
| 125 |
-
"mean": 0.
|
| 126 |
-
"std": 0.
|
| 127 |
"count": 100
|
| 128 |
}
|
| 129 |
}
|
|
|
|
| 1 |
{
|
| 2 |
"no_filter": {
|
| 3 |
"precision@1": {
|
| 4 |
+
"mean": 0.59,
|
| 5 |
+
"std": 0.49183330509431744,
|
| 6 |
"count": 100
|
| 7 |
},
|
| 8 |
"precision@3": {
|
| 9 |
+
"mean": 0.76,
|
| 10 |
+
"std": 0.42708313008125254,
|
| 11 |
"count": 100
|
| 12 |
},
|
| 13 |
"precision@5": {
|
| 14 |
+
"mean": 0.82,
|
| 15 |
+
"std": 0.38418745424597095,
|
| 16 |
"count": 100
|
| 17 |
},
|
| 18 |
"ndcg@1": {
|
| 19 |
+
"mean": 0.59,
|
| 20 |
+
"std": 0.49183330509431744,
|
| 21 |
"count": 100
|
| 22 |
},
|
| 23 |
"ndcg@3": {
|
| 24 |
+
"mean": 0.6946394630357184,
|
| 25 |
+
"std": 0.41495405707705707,
|
| 26 |
"count": 100
|
| 27 |
},
|
| 28 |
"ndcg@5": {
|
| 29 |
+
"mean": 0.7182888689781796,
|
| 30 |
+
"std": 0.3848500116841757,
|
| 31 |
"count": 100
|
| 32 |
}
|
| 33 |
},
|
| 34 |
"species_only": {
|
| 35 |
"precision@1": {
|
| 36 |
+
"mean": 0.73,
|
| 37 |
+
"std": 0.4439594576084623,
|
| 38 |
"count": 100
|
| 39 |
},
|
| 40 |
"precision@3": {
|
| 41 |
+
"mean": 0.98,
|
| 42 |
+
"std": 0.13999999999999999,
|
| 43 |
"count": 100
|
| 44 |
},
|
| 45 |
"precision@5": {
|
|
|
|
| 48 |
"count": 100
|
| 49 |
},
|
| 50 |
"ndcg@1": {
|
| 51 |
+
"mean": 0.73,
|
| 52 |
+
"std": 0.4439594576084623,
|
| 53 |
"count": 100
|
| 54 |
},
|
| 55 |
"ndcg@3": {
|
| 56 |
+
"mean": 0.8811859507142915,
|
| 57 |
+
"std": 0.2136019453378135,
|
| 58 |
"count": 100
|
| 59 |
},
|
| 60 |
"ndcg@5": {
|
| 61 |
+
"mean": 0.8850544787866369,
|
| 62 |
+
"std": 0.2007226726424171,
|
| 63 |
"count": 100
|
| 64 |
}
|
| 65 |
},
|
| 66 |
"region_only": {
|
| 67 |
"precision@1": {
|
| 68 |
+
"mean": 0.61,
|
| 69 |
+
"std": 0.4877499359302879,
|
| 70 |
"count": 100
|
| 71 |
},
|
| 72 |
"precision@3": {
|
| 73 |
+
"mean": 0.77,
|
| 74 |
+
"std": 0.4208325082500163,
|
| 75 |
"count": 100
|
| 76 |
},
|
| 77 |
"precision@5": {
|
| 78 |
+
"mean": 0.83,
|
| 79 |
+
"std": 0.375632799419859,
|
| 80 |
"count": 100
|
| 81 |
},
|
| 82 |
"ndcg@1": {
|
| 83 |
+
"mean": 0.61,
|
| 84 |
+
"std": 0.4877499359302879,
|
| 85 |
"count": 100
|
| 86 |
},
|
| 87 |
"ndcg@3": {
|
| 88 |
+
"mean": 0.7083301655000039,
|
| 89 |
+
"std": 0.4110942789611411,
|
| 90 |
"count": 100
|
| 91 |
},
|
| 92 |
"ndcg@5": {
|
| 93 |
+
"mean": 0.7319795714424648,
|
| 94 |
+
"std": 0.37983366654728035,
|
| 95 |
"count": 100
|
| 96 |
}
|
| 97 |
},
|
| 98 |
"species_and_region": {
|
| 99 |
"precision@1": {
|
| 100 |
+
"mean": 0.75,
|
| 101 |
+
"std": 0.4330127018922193,
|
| 102 |
"count": 100
|
| 103 |
},
|
| 104 |
"precision@3": {
|
|
|
|
| 107 |
"count": 100
|
| 108 |
},
|
| 109 |
"precision@5": {
|
| 110 |
+
"mean": 1.0,
|
| 111 |
+
"std": 0.0,
|
| 112 |
"count": 100
|
| 113 |
},
|
| 114 |
"ndcg@1": {
|
| 115 |
+
"mean": 0.75,
|
| 116 |
+
"std": 0.4330127018922193,
|
| 117 |
"count": 100
|
| 118 |
},
|
| 119 |
"ndcg@3": {
|
| 120 |
+
"mean": 0.8898766531785769,
|
| 121 |
+
"std": 0.2091728695998248,
|
| 122 |
"count": 100
|
| 123 |
},
|
| 124 |
"ndcg@5": {
|
| 125 |
+
"mean": 0.8980519468316562,
|
| 126 |
+
"std": 0.18024378480878206,
|
| 127 |
"count": 100
|
| 128 |
}
|
| 129 |
}
|
species-organized/PestID Species - Organized.xlsx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cb0ceab28dfca471472fcc9e4631100826dc5ea2d3fdf2b78c215b859938eb61
|
| 3 |
+
size 27466
|
species-organized/species_analysis.png
ADDED
|
Git LFS Details
|
species-organized/species_analysis.py
ADDED
|
@@ -0,0 +1,513 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Species Analysis Script
|
| 3 |
+
=======================
|
| 4 |
+
Analyzes pest species data from PestID Species - Organized.xlsx
|
| 5 |
+
Generates: LaTeX table, multi-panel visualization, and statistics summary
|
| 6 |
+
|
| 7 |
+
Author: AgLLM Project
|
| 8 |
+
Date: 2025-10-26
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
import pandas as pd
|
| 12 |
+
import numpy as np
|
| 13 |
+
import matplotlib.pyplot as plt
|
| 14 |
+
from matplotlib.gridspec import GridSpec
|
| 15 |
+
from matplotlib.patches import Circle
|
| 16 |
+
from matplotlib.patches import Patch
|
| 17 |
+
from pathlib import Path
|
| 18 |
+
|
| 19 |
+
# Set font parameters (matching reference style)
|
| 20 |
+
plt.rcParams.update({
|
| 21 |
+
'font.family': 'Arial',
|
| 22 |
+
'font.size': 14,
|
| 23 |
+
'axes.labelsize': 14,
|
| 24 |
+
'axes.titlesize': 15,
|
| 25 |
+
'xtick.labelsize': 12,
|
| 26 |
+
'ytick.labelsize': 12,
|
| 27 |
+
'legend.fontsize': 13,
|
| 28 |
+
})
|
| 29 |
+
|
| 30 |
+
# Define color palette
|
| 31 |
+
COLORS = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#6A994E', '#BC4B51', '#5B8E7D', '#F4A259']
|
| 32 |
+
REGION_COLORS = {
|
| 33 |
+
'US': '#2E86AB',
|
| 34 |
+
'Africa': '#F18F01',
|
| 35 |
+
'India': '#6A994E'
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
# File paths
|
| 39 |
+
SCRIPT_DIR = Path(__file__).parent
|
| 40 |
+
DATA_FILE = SCRIPT_DIR / 'PestID Species - Organized.xlsx'
|
| 41 |
+
OUTPUT_TABLE = SCRIPT_DIR / 'species_table.tex'
|
| 42 |
+
OUTPUT_PLOT_PDF = SCRIPT_DIR / 'species_analysis.pdf'
|
| 43 |
+
OUTPUT_PLOT_PNG = SCRIPT_DIR / 'species_analysis.png'
|
| 44 |
+
OUTPUT_STATS = SCRIPT_DIR / 'species_statistics.txt'
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def load_and_prepare_data():
|
| 48 |
+
"""Load data from all sheets and create consolidated DataFrame"""
|
| 49 |
+
print("Loading data from Excel file...")
|
| 50 |
+
|
| 51 |
+
# Read all sheets
|
| 52 |
+
us_df = pd.read_excel(DATA_FILE, sheet_name='US')
|
| 53 |
+
africa_df = pd.read_excel(DATA_FILE, sheet_name='Africa')
|
| 54 |
+
india_df = pd.read_excel(DATA_FILE, sheet_name='India')
|
| 55 |
+
|
| 56 |
+
# Add region column
|
| 57 |
+
us_df['Region'] = 'US'
|
| 58 |
+
africa_df['Region'] = 'Africa'
|
| 59 |
+
india_df['Region'] = 'India'
|
| 60 |
+
|
| 61 |
+
# Standardize column names across sheets
|
| 62 |
+
# US: Species, Common Name, Tag, Accuracy
|
| 63 |
+
# Africa: Common Name, Species, Tag, Accuracy, IPM Info, Excluded Link
|
| 64 |
+
# India: Common Name, Species, Tag, Accuracy, IPM Info
|
| 65 |
+
|
| 66 |
+
# Reorder columns for US to match others
|
| 67 |
+
us_df = us_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy']]
|
| 68 |
+
us_df['IPM Info'] = None
|
| 69 |
+
us_df['Excluded Link'] = None
|
| 70 |
+
|
| 71 |
+
# Reorder columns for Africa
|
| 72 |
+
africa_df = africa_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'IPM Info', 'Excluded Link']]
|
| 73 |
+
|
| 74 |
+
# Reorder columns for India (no Excluded Link)
|
| 75 |
+
india_df = india_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'IPM Info']]
|
| 76 |
+
india_df['Excluded Link'] = None
|
| 77 |
+
|
| 78 |
+
# FIX: India uses decimal accuracy (0.59-0.95) instead of percentage
|
| 79 |
+
# Multiply India accuracy by 100
|
| 80 |
+
india_df['Accuracy'] = india_df['Accuracy'] * 100
|
| 81 |
+
|
| 82 |
+
# Concatenate all DataFrames
|
| 83 |
+
df_consolidated = pd.concat([us_df, africa_df, india_df], ignore_index=True)
|
| 84 |
+
|
| 85 |
+
# Add helper column for IPM availability
|
| 86 |
+
df_consolidated['Has_IPM'] = df_consolidated['IPM Info'].notna()
|
| 87 |
+
|
| 88 |
+
print(f"β Loaded {len(df_consolidated)} species from 3 regions")
|
| 89 |
+
print(f" - US: {len(us_df)} species")
|
| 90 |
+
print(f" - Africa: {len(africa_df)} species")
|
| 91 |
+
print(f" - India: {len(india_df)} species")
|
| 92 |
+
|
| 93 |
+
return df_consolidated
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
def generate_latex_table(df):
|
| 97 |
+
"""Generate LaTeX table with all species data"""
|
| 98 |
+
print("\nGenerating LaTeX table...")
|
| 99 |
+
|
| 100 |
+
# Create simplified table for LaTeX (without full IPM text)
|
| 101 |
+
df_table = df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'Has_IPM']].copy()
|
| 102 |
+
|
| 103 |
+
# Format accuracy
|
| 104 |
+
df_table['Accuracy'] = df_table['Accuracy'].apply(lambda x: f"{x:.1f}" if pd.notna(x) else "β")
|
| 105 |
+
|
| 106 |
+
# Format Has_IPM as Yes/No
|
| 107 |
+
df_table['Has_IPM'] = df_table['Has_IPM'].apply(lambda x: "Yes" if x else "No")
|
| 108 |
+
|
| 109 |
+
# Replace NaN in Tag
|
| 110 |
+
df_table['Tag'] = df_table['Tag'].fillna("β")
|
| 111 |
+
|
| 112 |
+
# Sort by Region, then Tag, then Accuracy
|
| 113 |
+
df_table = df_table.sort_values(['Region', 'Tag', 'Accuracy'], ascending=[True, True, False])
|
| 114 |
+
|
| 115 |
+
# Start building LaTeX code
|
| 116 |
+
latex_code = []
|
| 117 |
+
latex_code.append("% LaTeX Table: Species Analysis")
|
| 118 |
+
latex_code.append("% Requires: \\usepackage{booktabs, longtable}")
|
| 119 |
+
latex_code.append("")
|
| 120 |
+
latex_code.append("\\begin{longtable}{llllrr}")
|
| 121 |
+
latex_code.append("\\toprule")
|
| 122 |
+
latex_code.append("\\textbf{Region} & \\textbf{Species} & \\textbf{Common Name} & \\textbf{Tag} & \\textbf{Accuracy (\\%)} & \\textbf{IPM Info} \\\\")
|
| 123 |
+
latex_code.append("\\midrule")
|
| 124 |
+
latex_code.append("\\endfirsthead")
|
| 125 |
+
latex_code.append("")
|
| 126 |
+
latex_code.append("\\multicolumn{6}{c}")
|
| 127 |
+
latex_code.append("{\\tablename\\ \\thetable\\ -- \\textit{Continued from previous page}} \\\\")
|
| 128 |
+
latex_code.append("\\toprule")
|
| 129 |
+
latex_code.append("\\textbf{Region} & \\textbf{Species} & \\textbf{Common Name} & \\textbf{Tag} & \\textbf{Accuracy (\\%)} & \\textbf{IPM Info} \\\\")
|
| 130 |
+
latex_code.append("\\midrule")
|
| 131 |
+
latex_code.append("\\endhead")
|
| 132 |
+
latex_code.append("")
|
| 133 |
+
latex_code.append("\\midrule")
|
| 134 |
+
latex_code.append("\\multicolumn{6}{r}{\\textit{Continued on next page}} \\\\")
|
| 135 |
+
latex_code.append("\\endfoot")
|
| 136 |
+
latex_code.append("")
|
| 137 |
+
latex_code.append("\\bottomrule")
|
| 138 |
+
latex_code.append("\\endlastfoot")
|
| 139 |
+
latex_code.append("")
|
| 140 |
+
|
| 141 |
+
# Add data rows
|
| 142 |
+
for idx, row in df_table.iterrows():
|
| 143 |
+
# Escape special LaTeX characters
|
| 144 |
+
species = str(row['Species']).replace('_', '\\_').replace('&', '\\&')
|
| 145 |
+
common_name = str(row['Common Name']).replace('_', '\\_').replace('&', '\\&')
|
| 146 |
+
|
| 147 |
+
latex_code.append(f"{row['Region']} & \\textit{{{species}}} & {common_name} & {row['Tag']} & {row['Accuracy']} & {row['Has_IPM']} \\\\")
|
| 148 |
+
|
| 149 |
+
latex_code.append("")
|
| 150 |
+
latex_code.append("\\end{longtable}")
|
| 151 |
+
|
| 152 |
+
# Write to file
|
| 153 |
+
with open(OUTPUT_TABLE, 'w') as f:
|
| 154 |
+
f.write('\n'.join(latex_code))
|
| 155 |
+
|
| 156 |
+
print(f"β LaTeX table saved to: {OUTPUT_TABLE}")
|
| 157 |
+
print(f" Contains {len(df_table)} species")
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
def create_visualization(df):
|
| 161 |
+
"""Create comprehensive multi-panel visualization"""
|
| 162 |
+
print("\nCreating visualization...")
|
| 163 |
+
|
| 164 |
+
# Create figure with GridSpec layout (2 rows Γ 2 columns)
|
| 165 |
+
fig = plt.figure(figsize=(14, 10))
|
| 166 |
+
gs = GridSpec(2, 2, figure=fig, hspace=0.35, wspace=0.35)
|
| 167 |
+
|
| 168 |
+
# 1. Species Count by Region (Top Left) - with Insect/Weed breakdown
|
| 169 |
+
ax1 = fig.add_subplot(gs[0, 0])
|
| 170 |
+
|
| 171 |
+
# Get insect and weed counts by region
|
| 172 |
+
tag_by_region = pd.crosstab(df['Region'], df['Tag'])
|
| 173 |
+
tag_by_region = tag_by_region.reindex(['US', 'Africa', 'India'])
|
| 174 |
+
|
| 175 |
+
insects = tag_by_region['insect'].values if 'insect' in tag_by_region.columns else [0, 0, 0]
|
| 176 |
+
weeds = tag_by_region['weed'].values if 'weed' in tag_by_region.columns else [0, 0, 0]
|
| 177 |
+
|
| 178 |
+
x = range(len(tag_by_region))
|
| 179 |
+
|
| 180 |
+
# Create stacked bars
|
| 181 |
+
bars1_insects = ax1.bar(x, insects, label='Insect', color=COLORS[0], alpha=0.8, edgecolor='black', linewidth=0.5)
|
| 182 |
+
bars1_weeds = ax1.bar(x, weeds, bottom=insects, label='Weed', color=COLORS[2], alpha=0.8, edgecolor='black', linewidth=0.5)
|
| 183 |
+
|
| 184 |
+
ax1.set_xticks(x)
|
| 185 |
+
ax1.set_xticklabels(tag_by_region.index)
|
| 186 |
+
ax1.set_ylabel('Number of Species')
|
| 187 |
+
ax1.set_title('Species Count by Region')
|
| 188 |
+
ax1.legend(loc='upper right', fontsize=11)
|
| 189 |
+
|
| 190 |
+
# Add total count labels on top
|
| 191 |
+
for i, region in enumerate(tag_by_region.index):
|
| 192 |
+
total = insects[i] + weeds[i]
|
| 193 |
+
ax1.text(i, total + 2, str(int(total)), ha='center', va='bottom',
|
| 194 |
+
fontsize=12, fontweight='bold')
|
| 195 |
+
|
| 196 |
+
# 2. Accuracy Distribution by Region - Box Plot (Top Right)
|
| 197 |
+
ax2 = fig.add_subplot(gs[0, 1])
|
| 198 |
+
|
| 199 |
+
# Prepare data for box plot
|
| 200 |
+
accuracy_data = []
|
| 201 |
+
labels = []
|
| 202 |
+
colors_box = []
|
| 203 |
+
for region in ['US', 'Africa', 'India']:
|
| 204 |
+
region_acc = df[df['Region'] == region]['Accuracy'].dropna()
|
| 205 |
+
if len(region_acc) > 0:
|
| 206 |
+
accuracy_data.append(region_acc)
|
| 207 |
+
labels.append(region)
|
| 208 |
+
colors_box.append(REGION_COLORS[region])
|
| 209 |
+
|
| 210 |
+
bp = ax2.boxplot(accuracy_data, tick_labels=labels, patch_artist=True,
|
| 211 |
+
medianprops=dict(color='red', linewidth=2),
|
| 212 |
+
whiskerprops=dict(linewidth=1.5),
|
| 213 |
+
boxprops=dict(linewidth=1.5),
|
| 214 |
+
showfliers=True)
|
| 215 |
+
|
| 216 |
+
# Color the boxes
|
| 217 |
+
for patch, color in zip(bp['boxes'], colors_box):
|
| 218 |
+
patch.set_facecolor(color)
|
| 219 |
+
patch.set_alpha(0.7)
|
| 220 |
+
|
| 221 |
+
ax2.set_ylabel('Accuracy (%)')
|
| 222 |
+
ax2.set_title('Accuracy Distribution by Region')
|
| 223 |
+
ax2.grid(True, alpha=0.3, axis='y')
|
| 224 |
+
ax2.set_ylim(35, 105)
|
| 225 |
+
|
| 226 |
+
# 3. Species Overlap - Venn Diagram (Bottom Left)
|
| 227 |
+
ax3 = fig.add_subplot(gs[1, 0])
|
| 228 |
+
|
| 229 |
+
# Calculate species overlap
|
| 230 |
+
us_species = set(df[df['Region'] == 'US']['Species'].str.lower().str.strip())
|
| 231 |
+
africa_species = set(df[df['Region'] == 'Africa']['Species'].str.lower().str.strip())
|
| 232 |
+
india_species = set(df[df['Region'] == 'India']['Species'].str.lower().str.strip())
|
| 233 |
+
|
| 234 |
+
us_only = len(us_species - africa_species - india_species)
|
| 235 |
+
africa_only = len(africa_species - us_species - india_species)
|
| 236 |
+
india_only = len(india_species - us_species - africa_species)
|
| 237 |
+
us_africa = len((us_species & africa_species) - india_species)
|
| 238 |
+
us_india = len((us_species & india_species) - africa_species)
|
| 239 |
+
africa_india = len((africa_species & india_species) - us_species)
|
| 240 |
+
all_three = len(us_species & africa_species & india_species)
|
| 241 |
+
|
| 242 |
+
# Create professional 3-circle Venn diagram
|
| 243 |
+
ax3.set_xlim(0, 4)
|
| 244 |
+
ax3.set_ylim(-0.5, 3.5) # Extended lower bound to prevent cutoff
|
| 245 |
+
ax3.set_aspect('equal')
|
| 246 |
+
ax3.axis('off')
|
| 247 |
+
|
| 248 |
+
# Circle parameters for proper overlap
|
| 249 |
+
radius = 1.0
|
| 250 |
+
# Positions chosen to create good overlaps
|
| 251 |
+
circle_us = Circle((1.2, 1.8), radius, color=REGION_COLORS['US'], alpha=0.4,
|
| 252 |
+
linewidth=2, edgecolor=REGION_COLORS['US'], fill=True)
|
| 253 |
+
circle_africa = Circle((2.8, 1.8), radius, color=REGION_COLORS['Africa'], alpha=0.4,
|
| 254 |
+
linewidth=2, edgecolor=REGION_COLORS['Africa'], fill=True)
|
| 255 |
+
circle_india = Circle((2.0, 0.7), radius, color=REGION_COLORS['India'], alpha=0.4,
|
| 256 |
+
linewidth=2, edgecolor=REGION_COLORS['India'], fill=True)
|
| 257 |
+
|
| 258 |
+
ax3.add_patch(circle_us)
|
| 259 |
+
ax3.add_patch(circle_africa)
|
| 260 |
+
ax3.add_patch(circle_india)
|
| 261 |
+
|
| 262 |
+
# Add labels for regions (outside circles, avoiding overlap)
|
| 263 |
+
ax3.text(0.4, 2.9, 'US', fontsize=13, fontweight='bold', color=REGION_COLORS['US'], ha='center')
|
| 264 |
+
ax3.text(3.6, 2.9, 'Africa', fontsize=13, fontweight='bold', color=REGION_COLORS['Africa'], ha='center')
|
| 265 |
+
ax3.text(2.0, -0.45, 'India', fontsize=13, fontweight='bold', color=REGION_COLORS['India'], ha='center')
|
| 266 |
+
|
| 267 |
+
# Add counts in appropriate regions
|
| 268 |
+
# US only (left)
|
| 269 |
+
ax3.text(0.7, 1.8, str(us_only), fontsize=14, fontweight='bold', ha='center', va='center')
|
| 270 |
+
|
| 271 |
+
# Africa only (right)
|
| 272 |
+
ax3.text(3.3, 1.8, str(africa_only), fontsize=14, fontweight='bold', ha='center', va='center')
|
| 273 |
+
|
| 274 |
+
# India only (bottom)
|
| 275 |
+
ax3.text(2.0, 0.3, str(india_only), fontsize=14, fontweight='bold', ha='center', va='center')
|
| 276 |
+
|
| 277 |
+
# US & Africa (top middle)
|
| 278 |
+
ax3.text(2.0, 2.1, str(us_africa), fontsize=13, fontweight='bold', ha='center', va='center',
|
| 279 |
+
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
|
| 280 |
+
|
| 281 |
+
# US & India (left-bottom)
|
| 282 |
+
ax3.text(1.4, 1.0, str(us_india), fontsize=13, fontweight='bold', ha='center', va='center',
|
| 283 |
+
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
|
| 284 |
+
|
| 285 |
+
# Africa & India (right-bottom)
|
| 286 |
+
ax3.text(2.6, 1.0, str(africa_india), fontsize=13, fontweight='bold', ha='center', va='center',
|
| 287 |
+
bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
|
| 288 |
+
|
| 289 |
+
# All three (center)
|
| 290 |
+
ax3.text(2.0, 1.4, str(all_three), fontsize=13, fontweight='bold', ha='center', va='center',
|
| 291 |
+
bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.6))
|
| 292 |
+
|
| 293 |
+
ax3.set_title('Species Overlap Across Regions', fontsize=15, pad=10)
|
| 294 |
+
|
| 295 |
+
# 4. Accuracy Range Distribution (Bottom Right)
|
| 296 |
+
ax4 = fig.add_subplot(gs[1, 1])
|
| 297 |
+
|
| 298 |
+
# Create accuracy bins
|
| 299 |
+
df_acc = df.dropna(subset=['Accuracy']).copy()
|
| 300 |
+
bins = [0, 50, 70, 85, 100]
|
| 301 |
+
labels_bins = ['0-50%', '50-70%', '70-85%', '85-100%']
|
| 302 |
+
df_acc['Accuracy_Range'] = pd.cut(df_acc['Accuracy'], bins=bins, labels=labels_bins, include_lowest=True)
|
| 303 |
+
|
| 304 |
+
range_counts = df_acc['Accuracy_Range'].value_counts().reindex(labels_bins)
|
| 305 |
+
|
| 306 |
+
bars4 = ax4.bar(range(len(range_counts)), range_counts.values,
|
| 307 |
+
color=COLORS[3], alpha=0.8, edgecolor='black', linewidth=1)
|
| 308 |
+
ax4.set_xticks(range(len(range_counts)))
|
| 309 |
+
ax4.set_xticklabels(range_counts.index, rotation=0)
|
| 310 |
+
ax4.set_xlabel('Accuracy Range')
|
| 311 |
+
ax4.set_ylabel('Number of Species')
|
| 312 |
+
ax4.set_title('Species by Accuracy Range')
|
| 313 |
+
|
| 314 |
+
# Add value labels
|
| 315 |
+
for i, val in enumerate(range_counts.values):
|
| 316 |
+
if pd.notna(val) and val > 0:
|
| 317 |
+
ax4.text(i, val + 1.5, str(int(val)),
|
| 318 |
+
ha='center', va='bottom', fontsize=11, fontweight='bold')
|
| 319 |
+
|
| 320 |
+
# Add overall title
|
| 321 |
+
plt.suptitle('PestID Bot Knowledgebank Features', fontsize=18, fontweight='bold', y=0.98)
|
| 322 |
+
|
| 323 |
+
# Save figures
|
| 324 |
+
plt.savefig(OUTPUT_PLOT_PDF, dpi=300, bbox_inches='tight')
|
| 325 |
+
plt.savefig(OUTPUT_PLOT_PNG, dpi=300, bbox_inches='tight')
|
| 326 |
+
|
| 327 |
+
print(f"β Visualization saved:")
|
| 328 |
+
print(f" - {OUTPUT_PLOT_PDF}")
|
| 329 |
+
print(f" - {OUTPUT_PLOT_PNG}")
|
| 330 |
+
|
| 331 |
+
|
| 332 |
+
def generate_statistics(df):
|
| 333 |
+
"""Generate comprehensive statistics summary"""
|
| 334 |
+
print("\nGenerating statistics summary...")
|
| 335 |
+
|
| 336 |
+
stats = []
|
| 337 |
+
stats.append("=" * 80)
|
| 338 |
+
stats.append("PEST SPECIES ANALYSIS - STATISTICS SUMMARY")
|
| 339 |
+
stats.append("=" * 80)
|
| 340 |
+
stats.append("")
|
| 341 |
+
|
| 342 |
+
# 1. Overall counts
|
| 343 |
+
stats.append("1. OVERALL SPECIES COUNTS")
|
| 344 |
+
stats.append("-" * 40)
|
| 345 |
+
stats.append(f"Total species: {len(df)}")
|
| 346 |
+
stats.append("")
|
| 347 |
+
stats.append("By Region:")
|
| 348 |
+
for region in ['US', 'Africa', 'India']:
|
| 349 |
+
count = len(df[df['Region'] == region])
|
| 350 |
+
percentage = (count / len(df)) * 100
|
| 351 |
+
stats.append(f" {region:10s}: {count:3d} species ({percentage:5.1f}%)")
|
| 352 |
+
stats.append("")
|
| 353 |
+
|
| 354 |
+
# 2. Tag distribution
|
| 355 |
+
stats.append("2. INSECT VS WEED DISTRIBUTION")
|
| 356 |
+
stats.append("-" * 40)
|
| 357 |
+
|
| 358 |
+
total_insects = len(df[df['Tag'] == 'insect'])
|
| 359 |
+
total_weeds = len(df[df['Tag'] == 'weed'])
|
| 360 |
+
stats.append(f"Overall:")
|
| 361 |
+
stats.append(f" Insects: {total_insects} ({total_insects/len(df)*100:.1f}%)")
|
| 362 |
+
stats.append(f" Weeds: {total_weeds} ({total_weeds/len(df)*100:.1f}%)")
|
| 363 |
+
stats.append("")
|
| 364 |
+
|
| 365 |
+
stats.append("By Region:")
|
| 366 |
+
for region in ['US', 'Africa', 'India']:
|
| 367 |
+
region_df = df[df['Region'] == region]
|
| 368 |
+
insects = len(region_df[region_df['Tag'] == 'insect'])
|
| 369 |
+
weeds = len(region_df[region_df['Tag'] == 'weed'])
|
| 370 |
+
stats.append(f" {region}:")
|
| 371 |
+
stats.append(f" Insects: {insects}")
|
| 372 |
+
stats.append(f" Weeds: {weeds}")
|
| 373 |
+
stats.append("")
|
| 374 |
+
|
| 375 |
+
# 3. Accuracy statistics
|
| 376 |
+
stats.append("3. ACCURACY STATISTICS")
|
| 377 |
+
stats.append("-" * 40)
|
| 378 |
+
|
| 379 |
+
for region in ['US', 'Africa', 'India']:
|
| 380 |
+
region_df = df[df['Region'] == region]
|
| 381 |
+
acc = region_df['Accuracy'].dropna()
|
| 382 |
+
|
| 383 |
+
stats.append(f"{region}:")
|
| 384 |
+
if len(acc) > 0:
|
| 385 |
+
stats.append(f" Mean: {acc.mean():6.2f}%")
|
| 386 |
+
stats.append(f" Median: {acc.median():6.2f}%")
|
| 387 |
+
stats.append(f" Std Dev: {acc.std():6.2f}%")
|
| 388 |
+
stats.append(f" Min: {acc.min():6.2f}%")
|
| 389 |
+
stats.append(f" Max: {acc.max():6.2f}%")
|
| 390 |
+
missing = region_df['Accuracy'].isna().sum()
|
| 391 |
+
stats.append(f" Missing: {missing} ({missing/len(region_df)*100:.1f}%)")
|
| 392 |
+
else:
|
| 393 |
+
stats.append(f" No accuracy data")
|
| 394 |
+
stats.append("")
|
| 395 |
+
|
| 396 |
+
# Overall accuracy
|
| 397 |
+
all_acc = df['Accuracy'].dropna()
|
| 398 |
+
stats.append("Overall (all regions):")
|
| 399 |
+
stats.append(f" Mean: {all_acc.mean():6.2f}%")
|
| 400 |
+
stats.append(f" Median: {all_acc.median():6.2f}%")
|
| 401 |
+
stats.append(f" Std Dev: {all_acc.std():6.2f}%")
|
| 402 |
+
stats.append(f" Range: {all_acc.min():.2f}% - {all_acc.max():.2f}%")
|
| 403 |
+
stats.append("")
|
| 404 |
+
|
| 405 |
+
# 4. IPM Info coverage
|
| 406 |
+
stats.append("4. IPM INFORMATION COVERAGE")
|
| 407 |
+
stats.append("-" * 40)
|
| 408 |
+
|
| 409 |
+
for region in ['US', 'Africa', 'India']:
|
| 410 |
+
region_df = df[df['Region'] == region]
|
| 411 |
+
with_ipm = region_df['Has_IPM'].sum()
|
| 412 |
+
total = len(region_df)
|
| 413 |
+
percentage = (with_ipm / total) * 100
|
| 414 |
+
stats.append(f"{region:10s}: {with_ipm:2d}/{total:2d} species ({percentage:5.1f}%)")
|
| 415 |
+
|
| 416 |
+
total_ipm = df['Has_IPM'].sum()
|
| 417 |
+
stats.append(f"{'Overall':10s}: {total_ipm:2d}/{len(df):2d} species ({total_ipm/len(df)*100:5.1f}%)")
|
| 418 |
+
stats.append("")
|
| 419 |
+
|
| 420 |
+
# 5. Top species by accuracy
|
| 421 |
+
stats.append("5. TOP 10 SPECIES BY ACCURACY")
|
| 422 |
+
stats.append("-" * 40)
|
| 423 |
+
|
| 424 |
+
top_10 = df.dropna(subset=['Accuracy']).nlargest(10, 'Accuracy')
|
| 425 |
+
for i, (idx, row) in enumerate(top_10.iterrows(), 1):
|
| 426 |
+
stats.append(f"{i:2d}. {row['Common Name']:30s} ({row['Species']:25s}) - {row['Accuracy']:5.1f}% [{row['Region']}]")
|
| 427 |
+
stats.append("")
|
| 428 |
+
|
| 429 |
+
# 6. Species with lowest accuracy
|
| 430 |
+
stats.append("6. BOTTOM 10 SPECIES BY ACCURACY")
|
| 431 |
+
stats.append("-" * 40)
|
| 432 |
+
|
| 433 |
+
bottom_10 = df.dropna(subset=['Accuracy']).nsmallest(10, 'Accuracy')
|
| 434 |
+
for i, (idx, row) in enumerate(bottom_10.iterrows(), 1):
|
| 435 |
+
stats.append(f"{i:2d}. {row['Common Name']:30s} ({row['Species']:25s}) - {row['Accuracy']:5.1f}% [{row['Region']}]")
|
| 436 |
+
stats.append("")
|
| 437 |
+
|
| 438 |
+
# 7. Species overlap analysis
|
| 439 |
+
stats.append("7. SPECIES OVERLAP ACROSS REGIONS")
|
| 440 |
+
stats.append("-" * 40)
|
| 441 |
+
|
| 442 |
+
us_species = set(df[df['Region'] == 'US']['Species'].str.lower().str.strip())
|
| 443 |
+
africa_species = set(df[df['Region'] == 'Africa']['Species'].str.lower().str.strip())
|
| 444 |
+
india_species = set(df[df['Region'] == 'India']['Species'].str.lower().str.strip())
|
| 445 |
+
|
| 446 |
+
overlap_us_africa = us_species & africa_species
|
| 447 |
+
overlap_us_india = us_species & india_species
|
| 448 |
+
overlap_africa_india = africa_species & india_species
|
| 449 |
+
all_three = us_species & africa_species & india_species
|
| 450 |
+
|
| 451 |
+
stats.append(f"US & Africa: {len(overlap_us_africa)} species")
|
| 452 |
+
if len(overlap_us_africa) > 0:
|
| 453 |
+
for species in sorted(overlap_us_africa):
|
| 454 |
+
stats.append(f" - {species}")
|
| 455 |
+
|
| 456 |
+
stats.append(f"\nUS & India: {len(overlap_us_india)} species")
|
| 457 |
+
if len(overlap_us_india) > 0:
|
| 458 |
+
for species in sorted(overlap_us_india):
|
| 459 |
+
stats.append(f" - {species}")
|
| 460 |
+
|
| 461 |
+
stats.append(f"\nAfrica & India: {len(overlap_africa_india)} species")
|
| 462 |
+
if len(overlap_africa_india) > 0:
|
| 463 |
+
for species in sorted(overlap_africa_india):
|
| 464 |
+
stats.append(f" - {species}")
|
| 465 |
+
|
| 466 |
+
stats.append(f"\nAll three regions: {len(all_three)} species")
|
| 467 |
+
if len(all_three) > 0:
|
| 468 |
+
for species in sorted(all_three):
|
| 469 |
+
stats.append(f" - {species}")
|
| 470 |
+
|
| 471 |
+
stats.append("")
|
| 472 |
+
stats.append("=" * 80)
|
| 473 |
+
stats.append(f"Analysis completed on: 2025-10-26")
|
| 474 |
+
stats.append("=" * 80)
|
| 475 |
+
|
| 476 |
+
# Write to file
|
| 477 |
+
with open(OUTPUT_STATS, 'w') as f:
|
| 478 |
+
f.write('\n'.join(stats))
|
| 479 |
+
|
| 480 |
+
print(f"β Statistics saved to: {OUTPUT_STATS}")
|
| 481 |
+
|
| 482 |
+
# Also print to console
|
| 483 |
+
print("\n" + '\n'.join(stats[:50])) # Print first 50 lines to console
|
| 484 |
+
|
| 485 |
+
|
| 486 |
+
def main():
|
| 487 |
+
"""Main execution function"""
|
| 488 |
+
print("=" * 80)
|
| 489 |
+
print("PEST SPECIES ANALYSIS")
|
| 490 |
+
print("=" * 80)
|
| 491 |
+
print()
|
| 492 |
+
|
| 493 |
+
# Load and prepare data
|
| 494 |
+
df = load_and_prepare_data()
|
| 495 |
+
|
| 496 |
+
# Generate outputs
|
| 497 |
+
generate_latex_table(df)
|
| 498 |
+
create_visualization(df)
|
| 499 |
+
generate_statistics(df)
|
| 500 |
+
|
| 501 |
+
print("\n" + "=" * 80)
|
| 502 |
+
print("ANALYSIS COMPLETE!")
|
| 503 |
+
print("=" * 80)
|
| 504 |
+
print("\nGenerated files:")
|
| 505 |
+
print(f" 1. {OUTPUT_TABLE.name} - LaTeX table")
|
| 506 |
+
print(f" 2. {OUTPUT_PLOT_PDF.name} - Visualization (PDF)")
|
| 507 |
+
print(f" 3. {OUTPUT_PLOT_PNG.name} - Visualization (PNG)")
|
| 508 |
+
print(f" 4. {OUTPUT_STATS.name} - Statistics summary")
|
| 509 |
+
print()
|
| 510 |
+
|
| 511 |
+
|
| 512 |
+
if __name__ == '__main__':
|
| 513 |
+
main()
|
species-organized/species_statistics.txt
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
================================================================================
|
| 2 |
+
PEST SPECIES ANALYSIS - STATISTICS SUMMARY
|
| 3 |
+
================================================================================
|
| 4 |
+
|
| 5 |
+
1. OVERALL SPECIES COUNTS
|
| 6 |
+
----------------------------------------
|
| 7 |
+
Total species: 126
|
| 8 |
+
|
| 9 |
+
By Region:
|
| 10 |
+
US : 80 species ( 63.5%)
|
| 11 |
+
Africa : 35 species ( 27.8%)
|
| 12 |
+
India : 11 species ( 8.7%)
|
| 13 |
+
|
| 14 |
+
2. INSECT VS WEED DISTRIBUTION
|
| 15 |
+
----------------------------------------
|
| 16 |
+
Overall:
|
| 17 |
+
Insects: 65 (51.6%)
|
| 18 |
+
Weeds: 59 (46.8%)
|
| 19 |
+
|
| 20 |
+
By Region:
|
| 21 |
+
US:
|
| 22 |
+
Insects: 44
|
| 23 |
+
Weeds: 36
|
| 24 |
+
Africa:
|
| 25 |
+
Insects: 10
|
| 26 |
+
Weeds: 23
|
| 27 |
+
India:
|
| 28 |
+
Insects: 11
|
| 29 |
+
Weeds: 0
|
| 30 |
+
|
| 31 |
+
3. ACCURACY STATISTICS
|
| 32 |
+
----------------------------------------
|
| 33 |
+
US:
|
| 34 |
+
Mean: 89.69%
|
| 35 |
+
Median: 91.00%
|
| 36 |
+
Std Dev: 11.74%
|
| 37 |
+
Min: 40.00%
|
| 38 |
+
Max: 100.00%
|
| 39 |
+
Missing: 19 (23.8%)
|
| 40 |
+
|
| 41 |
+
Africa:
|
| 42 |
+
Mean: 89.81%
|
| 43 |
+
Median: 95.00%
|
| 44 |
+
Std Dev: 10.83%
|
| 45 |
+
Min: 59.00%
|
| 46 |
+
Max: 100.00%
|
| 47 |
+
Missing: 8 (22.9%)
|
| 48 |
+
|
| 49 |
+
India:
|
| 50 |
+
Mean: 80.00%
|
| 51 |
+
Median: 83.00%
|
| 52 |
+
Std Dev: 10.69%
|
| 53 |
+
Min: 59.00%
|
| 54 |
+
Max: 95.00%
|
| 55 |
+
Missing: 1 (9.1%)
|
| 56 |
+
|
| 57 |
+
Overall (all regions):
|
| 58 |
+
Mean: 88.73%
|
| 59 |
+
Median: 90.00%
|
| 60 |
+
Std Dev: 11.67%
|
| 61 |
+
Range: 40.00% - 100.00%
|
| 62 |
+
|
| 63 |
+
4. IPM INFORMATION COVERAGE
|
| 64 |
+
----------------------------------------
|
| 65 |
+
US : 0/80 species ( 0.0%)
|
| 66 |
+
Africa : 35/35 species (100.0%)
|
| 67 |
+
India : 10/11 species ( 90.9%)
|
| 68 |
+
Overall : 45/126 species ( 35.7%)
|
| 69 |
+
|
| 70 |
+
5. TOP 10 SPECIES BY ACCURACY
|
| 71 |
+
----------------------------------------
|
| 72 |
+
1. Seedcorn beetle (stenolophus lecontei ) - 100.0% [US]
|
| 73 |
+
2. Seedcorn maggot (delia platura ) - 100.0% [US]
|
| 74 |
+
3. Hop Vine Borer (hydraecia immanis ) - 100.0% [US]
|
| 75 |
+
4. Barnyardgrass (echinochloa crus-galli ) - 100.0% [US]
|
| 76 |
+
5. common Cocklebur (xanthium strumarium ) - 100.0% [US]
|
| 77 |
+
6. common Lambsquarters (chenopodium album ) - 100.0% [US]
|
| 78 |
+
7. CommonWaterhemp (amaranthus tuberculatus ) - 100.0% [US]
|
| 79 |
+
8. Gaint ragweed (ambrosia trifida ) - 100.0% [US]
|
| 80 |
+
9. Henbit (deadnettle) (lamium amplexicaule ) - 100.0% [US]
|
| 81 |
+
10. Jimsonweed (datura stramonium ) - 100.0% [US]
|
| 82 |
+
|
| 83 |
+
6. BOTTOM 10 SPECIES BY ACCURACY
|
| 84 |
+
----------------------------------------
|
| 85 |
+
1. Annual ryegrass (lolium multiflorum ) - 40.0% [US]
|
| 86 |
+
2. Spotted fireworm (choristoneura parallela ) - 44.0% [US]
|
| 87 |
+
3. Cowpea aphid (Aphis craccivora ) - 59.0% [Africa]
|
| 88 |
+
4. Cowpea aphid (Aphis craccivora ) - 59.0% [India]
|
| 89 |
+
5. Spiraea Aphid (Aphis spiraecola ) - 67.0% [Africa]
|
| 90 |
+
6. Spiraea Aphid (Aphis spiraecola ) - 67.0% [India]
|
| 91 |
+
7. alfalfa weevil (hypera postica ) - 73.0% [US]
|
| 92 |
+
8. twospotted spider mite (tetranychus urticae ) - 73.0% [US]
|
| 93 |
+
9. Corn ear borer (Helicoverpa armigera ) - 74.0% [Africa]
|
| 94 |
+
10. Corn ear borer (Helicoverpa armigera ) - 74.0% [India]
|
| 95 |
+
|
| 96 |
+
7. SPECIES OVERLAP ACROSS REGIONS
|
| 97 |
+
----------------------------------------
|
| 98 |
+
US & Africa: 2 species
|
| 99 |
+
- amaranthus tuberculatus
|
| 100 |
+
- cyperus esculentus
|
| 101 |
+
|
| 102 |
+
US & India: 1 species
|
| 103 |
+
- spodoptera frugiperda
|
| 104 |
+
|
| 105 |
+
Africa & India: 10 species
|
| 106 |
+
- aphis craccivora
|
| 107 |
+
- aphis spiraecola
|
| 108 |
+
- atherigona reversura
|
| 109 |
+
- drosophila suzukii
|
| 110 |
+
- euborellia annulipes
|
| 111 |
+
- halyomorpha halys
|
| 112 |
+
- helicoverpa armigera
|
| 113 |
+
- icerya purchasi
|
| 114 |
+
- nezara viridula
|
| 115 |
+
- spodoptera litura
|
| 116 |
+
|
| 117 |
+
All three regions: 0 species
|
| 118 |
+
|
| 119 |
+
================================================================================
|
| 120 |
+
Analysis completed on: 2025-10-26
|
| 121 |
+
================================================================================
|
species-organized/species_table.tex
ADDED
|
@@ -0,0 +1,151 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
% LaTeX Table: Species Analysis
|
| 2 |
+
% Requires: \usepackage{booktabs, longtable}
|
| 3 |
+
|
| 4 |
+
\begin{longtable}{llllrr}
|
| 5 |
+
\toprule
|
| 6 |
+
\textbf{Region} & \textbf{Species} & \textbf{Common Name} & \textbf{Tag} & \textbf{Accuracy (\%)} & \textbf{IPM Info} \\
|
| 7 |
+
\midrule
|
| 8 |
+
\endfirsthead
|
| 9 |
+
|
| 10 |
+
\multicolumn{6}{c}
|
| 11 |
+
{\tablename\ \thetable\ -- \textit{Continued from previous page}} \\
|
| 12 |
+
\toprule
|
| 13 |
+
\textbf{Region} & \textbf{Species} & \textbf{Common Name} & \textbf{Tag} & \textbf{Accuracy (\%)} & \textbf{IPM Info} \\
|
| 14 |
+
\midrule
|
| 15 |
+
\endhead
|
| 16 |
+
|
| 17 |
+
\midrule
|
| 18 |
+
\multicolumn{6}{r}{\textit{Continued on next page}} \\
|
| 19 |
+
\endfoot
|
| 20 |
+
|
| 21 |
+
\bottomrule
|
| 22 |
+
\endlastfoot
|
| 23 |
+
|
| 24 |
+
Africa & \textit{Halyomorpha halys} & Brown Marmorated Stink Bug & insect & 95.0 & Yes \\
|
| 25 |
+
Africa & \textit{Nezara viridula} & Green stink bug & insect & 88.0 & Yes \\
|
| 26 |
+
Africa & \textit{Drosophila suzukii} & Spotted-winged Drosophila & insect & 86.0 & Yes \\
|
| 27 |
+
Africa & \textit{Spodoptera litura} & Tobacco caterpillar & insect & 86.0 & Yes \\
|
| 28 |
+
Africa & \textit{Atherigona reversura} & Shoot fly & insect & 84.0 & Yes \\
|
| 29 |
+
Africa & \textit{Icerya purchasi} & Cottony cushion scale & insect & 82.0 & Yes \\
|
| 30 |
+
Africa & \textit{Euborellia annulipes} & Ring-legged Earwig & insect & 79.0 & Yes \\
|
| 31 |
+
Africa & \textit{Helicoverpa armigera} & Corn ear borer & insect & 74.0 & Yes \\
|
| 32 |
+
Africa & \textit{Aphis spiraecola} & Spiraea Aphid & insect & 67.0 & Yes \\
|
| 33 |
+
Africa & \textit{Aphis craccivora} & Cowpea aphid & insect & 59.0 & Yes \\
|
| 34 |
+
Africa & \textit{Trianthema triquetrum} & Red Spinach & weed & β & Yes \\
|
| 35 |
+
Africa & \textit{Trianthema portulacastrum} & Desert Horse Purslane & weed & β & Yes \\
|
| 36 |
+
Africa & \textit{Cleome rutidosperma} & Purple spider flower & weed & β & Yes \\
|
| 37 |
+
Africa & \textit{Cleome gynandra} & Spiderwisp & weed & β & Yes \\
|
| 38 |
+
Africa & \textit{Cleome viscosa} & Asian Spider flower & weed & β & Yes \\
|
| 39 |
+
Africa & \textit{Cleome spinosa} & Spiny Spider-Flower & weed & β & Yes \\
|
| 40 |
+
Africa & \textit{Cleome aculeata} & Prickly Spiderflower & weed & β & Yes \\
|
| 41 |
+
Africa & \textit{Cleome monophylla} & Singleleaf Spindlepod & weed & β & Yes \\
|
| 42 |
+
Africa & \textit{Amaranthus viridis} & Green Amaranth & weed & 95.0 & Yes \\
|
| 43 |
+
Africa & \textit{Cyperus entrerianus} & Deeproot Sedge & weed & 95.0 & Yes \\
|
| 44 |
+
Africa & \textit{Cyperus esculentus} & Yellow nutsedge & weed & 95.0 & Yes \\
|
| 45 |
+
Africa & \textit{Cyperus haspan} & Haspan flatsedge & weed & 95.0 & Yes \\
|
| 46 |
+
Africa & \textit{Cyperus iria L.} & Rice flatsedge & weed & 90.0 & Yes \\
|
| 47 |
+
Africa & \textit{Cyperus rotundus} & Purple Nutsedge & weed & 90.0 & Yes \\
|
| 48 |
+
Africa & \textit{Medicago minima} & Little Bur-clover & weed & 90.0 & Yes \\
|
| 49 |
+
Africa & \textit{Cyperus prolifer} & Dwarf papyrus & weed & 80.0 & Yes \\
|
| 50 |
+
Africa & \textit{Amaranthus tuberculatus} & Tall waterhemp & weed & 100.0 & Yes \\
|
| 51 |
+
Africa & \textit{Cyperus brevifolius} & Shortleaf flatsedge & weed & 100.0 & Yes \\
|
| 52 |
+
Africa & \textit{Cyperus difformis} & Smallflower umbrella sedge & weed & 100.0 & Yes \\
|
| 53 |
+
Africa & \textit{Cyperus mindorensis} & nan & weed & 100.0 & Yes \\
|
| 54 |
+
Africa & \textit{Cleome houtteana} & Spider flower & weed & 100.0 & Yes \\
|
| 55 |
+
Africa & \textit{Medicago falcata} & Yellow alfalfa & weed & 100.0 & Yes \\
|
| 56 |
+
Africa & \textit{Medicago lupulina} & Black Medick & weed & 100.0 & Yes \\
|
| 57 |
+
Africa & \textit{Medicago polymorpha} & Burr Medic & β & 95.0 & Yes \\
|
| 58 |
+
Africa & \textit{Striga asiatica} & Witch weed & β & 100.0 & Yes \\
|
| 59 |
+
India & \textit{Spodoptera frugiperda} & Fall Armyworm & insect & β & No \\
|
| 60 |
+
India & \textit{Halyomorpha halys} & Brown Marmorated Stink Bug & insect & 95.0 & Yes \\
|
| 61 |
+
India & \textit{Nezara viridula} & Green stink bug & insect & 88.0 & Yes \\
|
| 62 |
+
India & \textit{Drosophila suzukii} & Spotted-winged Drosophila & insect & 86.0 & Yes \\
|
| 63 |
+
India & \textit{Spodoptera litura} & Tobacco caterpillar & insect & 86.0 & Yes \\
|
| 64 |
+
India & \textit{Atherigona reversura} & Shoot fly & insect & 84.0 & Yes \\
|
| 65 |
+
India & \textit{Icerya purchasi} & Cottony cushion scale & insect & 82.0 & Yes \\
|
| 66 |
+
India & \textit{Euborellia annulipes} & Ring-legged Earwig & insect & 79.0 & Yes \\
|
| 67 |
+
India & \textit{Helicoverpa armigera} & Corn ear borer & insect & 74.0 & Yes \\
|
| 68 |
+
India & \textit{Aphis spiraecola} & Spiraea Aphid & insect & 67.0 & Yes \\
|
| 69 |
+
India & \textit{Aphis craccivora} & Cowpea aphid & insect & 59.0 & Yes \\
|
| 70 |
+
US & \textit{chaetocnema pulicaria} & corn flea beetle & insect & β & No \\
|
| 71 |
+
US & \textit{hypera zoilus} & clover leaf weevil & insect & β & No \\
|
| 72 |
+
US & \textit{agromyza frontella} & alfalfa blotch leafminer & insect & β & No \\
|
| 73 |
+
US & \textit{resseliella maxima} & soybean gall midge & insect & β & No \\
|
| 74 |
+
US & \textit{aphis glycines} & soybean aphid & insect & β & No \\
|
| 75 |
+
US & \textit{Damsel bugs} & Damsel bugs & insect & β & No \\
|
| 76 |
+
US & \textit{Flower fly larvae} & Flower fly larvae & insect & β & No \\
|
| 77 |
+
US & \textit{Ground beetles} & Ground beetles & insect & β & No \\
|
| 78 |
+
US & \textit{Lacewings} & Lacewings & insect & β & No \\
|
| 79 |
+
US & \textit{Lady beetles} & Lady beetles & insect & β & No \\
|
| 80 |
+
US & \textit{Parasitoid wasps} & Parasitoid wasps & insect & β & No \\
|
| 81 |
+
US & \textit{Pirate bugs} & Pirate bugs & insect & β & No \\
|
| 82 |
+
US & \textit{Soldier beetles} & Soldier beetles & insect & β & No \\
|
| 83 |
+
US & \textit{Podisus maculiventris} & Spined soldier bug & insect & β & No \\
|
| 84 |
+
US & \textit{Tachinid flies} & Tachinid flies & insect & β & No \\
|
| 85 |
+
US & \textit{empoasca fabae} & potato leafhopper & insect & 97.0 & No \\
|
| 86 |
+
US & \textit{striacosta albicosta} & western bean cutworm & insect & 97.0 & No \\
|
| 87 |
+
US & \textit{hypena scabra} & green cloverworm & insect & 96.0 & No \\
|
| 88 |
+
US & \textit{agrotis ipsilon} & black cutworm & insect & 95.0 & No \\
|
| 89 |
+
US & \textit{vanessa cardui} & painted lady & insect & 95.0 & No \\
|
| 90 |
+
US & \textit{popillia japonica} & Japanese beetle & insect & 94.0 & No \\
|
| 91 |
+
US & \textit{mythimna unipuncta} & armyworm & insect & 94.0 & No \\
|
| 92 |
+
US & \textit{lygus lineolaris} & tarnished plant bug & insect & 92.0 & No \\
|
| 93 |
+
US & \textit{colias eurytheme} & alfalfa caterpillar & insect & 91.0 & No \\
|
| 94 |
+
US & \textit{microtechnites bractatus} & garden fleahopper & insect & 90.0 & No \\
|
| 95 |
+
US & \textit{papaipema nebris} & stalk borer & insect & 90.0 & No \\
|
| 96 |
+
US & \textit{sitona hispidulus} & clover root curculio & insect & 89.0 & No \\
|
| 97 |
+
US & \textit{philaenus spumarius} & meadow spittlebug & insect & 89.0 & No \\
|
| 98 |
+
US & \textit{dectes texanus} & dectes stem borer & insect & 88.0 & No \\
|
| 99 |
+
US & \textit{ostrinia nubilalis} & European corn borer & insect & 88.0 & No \\
|
| 100 |
+
US & \textit{cerotoma trifurcata} & bean leaf beetle & insect & 87.0 & No \\
|
| 101 |
+
US & \textit{helicoverpa zea} & Tomato fruitworm & insect & 87.0 & No \\
|
| 102 |
+
US & \textit{spodoptera ornithogalli} & yellowstriped armyworm & insect & 86.0 & No \\
|
| 103 |
+
US & \textit{chrysodeixis includens} & soybean looper & insect & 83.0 & No \\
|
| 104 |
+
US & \textit{spodoptera frugiperda} & fall armyworm & insect & 80.0 & No \\
|
| 105 |
+
US & \textit{calomycterus setarius} & imported longhorned weevil & insect & 79.0 & No \\
|
| 106 |
+
US & \textit{loxostege cereralis} & alfalfa webworm & insect & 79.0 & No \\
|
| 107 |
+
US & \textit{odontota horni} & Soybean leaf miner & insect & 75.0 & No \\
|
| 108 |
+
US & \textit{hypera postica} & alfalfa weevil & insect & 73.0 & No \\
|
| 109 |
+
US & \textit{tetranychus urticae} & twospotted spider mite & insect & 73.0 & No \\
|
| 110 |
+
US & \textit{choristoneura parallela} & Spotted fireworm & insect & 44.0 & No \\
|
| 111 |
+
US & \textit{stenolophus lecontei} & Seedcorn beetle & insect & 100.0 & No \\
|
| 112 |
+
US & \textit{delia platura} & Seedcorn maggot & insect & 100.0 & No \\
|
| 113 |
+
US & \textit{hydraecia immanis} & Hop Vine Borer & insect & 100.0 & No \\
|
| 114 |
+
US & \textit{solanum ptycanthum} & Eastern black nightshade & weed & β & No \\
|
| 115 |
+
US & \textit{conyza canadensis} & Horseweed & weed & β & No \\
|
| 116 |
+
US & \textit{kochia scoparia} & Kochia & weed & β & No \\
|
| 117 |
+
US & \textit{sinapis arvensis} & Wild mustard & weed & β & No \\
|
| 118 |
+
US & \textit{ambrosia artemisiifolia} & common Ragweed & weed & 95.0 & No \\
|
| 119 |
+
US & \textit{stellaria media} & commonChickweed & weed & 95.0 & No \\
|
| 120 |
+
US & \textit{equisetum arvense} & Field Horsetail & weed & 95.0 & No \\
|
| 121 |
+
US & \textit{digitaria sanguinalis} & Large crabgrass & weed & 95.0 & No \\
|
| 122 |
+
US & \textit{sida spinosa} & Prickly sida & weed & 95.0 & No \\
|
| 123 |
+
US & \textit{cyperus esculentus} & yellow Nutsedge & weed & 95.0 & No \\
|
| 124 |
+
US & \textit{helianthus annuus} & Common Sunflower & weed & 90.0 & No \\
|
| 125 |
+
US & \textit{bromus tectorum} & Downy brome & weed & 90.0 & No \\
|
| 126 |
+
US & \textit{setaria viridis} & Green foxtail & weed & 90.0 & No \\
|
| 127 |
+
US & \textit{euphorbia dentata} & Toothed spurge & weed & 90.0 & No \\
|
| 128 |
+
US & \textit{mirabilis nyctaginea} & wild Four-oβclock & weed & 90.0 & No \\
|
| 129 |
+
US & \textit{setaria faberi} & Giant foxtail & weed & 85.0 & No \\
|
| 130 |
+
US & \textit{eleusine indica} & Goosegrass & weed & 85.0 & No \\
|
| 131 |
+
US & \textit{salsola tragus} & Russian thistle & weed & 85.0 & No \\
|
| 132 |
+
US & \textit{sorghum bicolor} & Shattercane & weed & 85.0 & No \\
|
| 133 |
+
US & \textit{setaria pumila} & Yellow foxtail & weed & 85.0 & No \\
|
| 134 |
+
US & \textit{persicaria pensylvanica} & Pennsylvania smartweed & weed & 80.0 & No \\
|
| 135 |
+
US & \textit{amaranthus palmeri} & Palmer amaranth & weed & 75.0 & No \\
|
| 136 |
+
US & \textit{lolium multiflorum} & Annual ryegrass & weed & 40.0 & No \\
|
| 137 |
+
US & \textit{echinochloa crus-galli} & Barnyardgrass & weed & 100.0 & No \\
|
| 138 |
+
US & \textit{xanthium strumarium} & common Cocklebur & weed & 100.0 & No \\
|
| 139 |
+
US & \textit{chenopodium album} & common Lambsquarters & weed & 100.0 & No \\
|
| 140 |
+
US & \textit{amaranthus tuberculatus} & CommonWaterhemp & weed & 100.0 & No \\
|
| 141 |
+
US & \textit{ambrosia trifida} & Gaint ragweed & weed & 100.0 & No \\
|
| 142 |
+
US & \textit{lamium amplexicaule} & Henbit (deadnettle) & weed & 100.0 & No \\
|
| 143 |
+
US & \textit{datura stramonium} & Jimsonweed & weed & 100.0 & No \\
|
| 144 |
+
US & \textit{lactuca serriola} & Prickly lettuce & weed & 100.0 & No \\
|
| 145 |
+
US & \textit{amaranthus retroflexus} & Redroot pigweed & weed & 100.0 & No \\
|
| 146 |
+
US & \textit{equisetum hyemale} & Scouringrush & weed & 100.0 & No \\
|
| 147 |
+
US & \textit{capsella bursa-pastoris} & Shepherdβs purse & weed & 100.0 & No \\
|
| 148 |
+
US & \textit{abutilon theophrasti} & Velvetleaf & weed & 100.0 & No \\
|
| 149 |
+
US & \textit{daucus carota} & Wild Carrot & weed & 100.0 & No \\
|
| 150 |
+
|
| 151 |
+
\end{longtable}
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/data_level0.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/header.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/length.bin
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 40000
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:11491cf0eac47e805aa1b059bb8d72b895d20b41d24581b6a4383eff57db12f5
|
| 3 |
size 40000
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/link_lists.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:12653e79b55a19108699f56736a4d97a4ad00f3627d6504348862d911eaa1688
|
| 3 |
+
size 5410816
|