arbabarshad commited on
Commit
42f7194
Β·
1 Parent(s): b9629f4

starting dec 29

Browse files
README.md CHANGED
@@ -9,7 +9,70 @@ app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ## Git LFS Troubleshooting Notes
15
 
@@ -44,11 +107,3 @@ This repository encountered several Git LFS issues during setup. Here's a summar
44
  * Pushing branches with problematic LFS history to a fresh remote can fail. Starting the remote with a clean, history-free branch is a workaround.
45
  * When adding LFS tracking for existing binary files via `.gitattributes`, ensure the commit correctly converts files to LFS pointers. `git add --renormalize .` after updating `.gitattributes` and *before* committing is often necessary.
46
  * Double-check `.gitignore` if expected files or directories are missing after a `git add .`.
47
-
48
-
49
- while running in claude code :
50
- source ~/miniconda3/etc/profile.d/conda.sh && conda install -c conda-forge numpy
51
- ate agthinker
52
-
53
- run command like example: source ~/miniconda3/etc/profile.d/conda.sh && conda activate agllm-env1-updates-1 && β”‚
54
- β”‚ python whatebverscriptis.py
 
9
  pinned: false
10
  license: apache-2.0
11
  ---
12
+
13
+ ## PestIDBot - Quick Reference
14
+
15
+ ### Environment
16
+ ```bash
17
+ source ~/miniconda3/etc/profile.d/conda.sh && conda activate agllm-env1-updates-1
18
+ ```
19
+
20
+ ### Key Commands
21
+ | Task | Command |
22
+ |------|---------|
23
+ | Build DB | `python app_database_prep.py` |
24
+ | Run Eval | `python retrieval_evaluation.py` |
25
+ | Run App | `python app.py` |
26
+ | Deploy Dev | `git push space3 fresh-start:main` |
27
+ | Deploy Prod | `git push space2 fresh-start:main` |
28
+
29
+ ### Git Remotes
30
+ - `space2` β†’ `git@hf.co:spaces/arbabarshad/agllm2` (production)
31
+ - `space3` β†’ `git@hf.co:spaces/arbabarshad/agllm2-dev` (dev)
32
+
33
+ ### Project Structure
34
+ ```
35
+ β”œβ”€β”€ app.py # Main Gradio app (deployed)
36
+ β”œβ”€β”€ app_database_prep.py # Builds ChromaDB from PDFs + Excel
37
+ β”œβ”€β”€ retrieval_evaluation.py # Runs 4-filter evaluation
38
+ β”œβ”€β”€ retrieval_evaluation_results.json # Eval metrics output
39
+ β”‚
40
+ β”œβ”€β”€ agllm-data/
41
+ β”‚ β”œβ”€β”€ agllm-data-isu-field-insects-all-species/
42
+ β”‚ β”‚ β”œβ”€β”€ *.pdf # Insect IPM documents
43
+ β”‚ β”‚ └── matched_species_results_v2.csv # Species metadata
44
+ β”‚ β”œβ”€β”€ agllm-data-isu-field-weeds-all-species/
45
+ β”‚ β”‚ β”œβ”€β”€ *.pdf # Weed IPM documents
46
+ β”‚ β”‚ └── matched_species_results_v2.csv # Species metadata
47
+ β”‚ └── PestID Species.xlsx # India & Africa data (sheets)
48
+ β”‚
49
+ β”œβ”€β”€ vector-databases-deployed/
50
+ β”‚ └── db5-agllm-data-isu-field-insects-all-species/ # ChromaDB output
51
+ β”‚
52
+ β”œβ”€β”€ species-organized/ # Analysis scripts & outputs
53
+ β”‚ β”œβ”€β”€ species_analysis.py # Generates paper Figure 3
54
+ β”‚ └── species_table.tex # LaTeX species table
55
+ β”‚
56
+ └── writing/ # Paper drafts
57
+ ```
58
+
59
+ ### Database Build Flow
60
+ 1. PDFs loaded from `agllm-data/` (insects + weeds)
61
+ 2. Metadata read from `matched_species_results_v2.csv` files
62
+ 3. Excel sheets (India, Africa) processed from `PestID Species.xlsx`
63
+ 4. Documents chunked (512 tokens, 10 overlap)
64
+ 5. Tagged with `matched_specie_X` + `region` metadata
65
+ 6. Stored in ChromaDB at `vector-databases-deployed/db5-*/`
66
+
67
+ ### Evaluation Filters (retrieval_evaluation.py)
68
+ | Filter | P@5 | nDCG@5 |
69
+ |--------|-----|--------|
70
+ | No Filter | 0.82 | 0.72 |
71
+ | Species Only | 0.99 | 0.89 |
72
+ | Region Only | 0.83 | 0.73 |
73
+ | Species + Region | **1.00** | **0.90** |
74
+
75
+ ---
76
 
77
  ## Git LFS Troubleshooting Notes
78
 
 
107
  * Pushing branches with problematic LFS history to a fresh remote can fail. Starting the remote with a clean, history-free branch is a workaround.
108
  * When adding LFS tracking for existing binary files via `.gitattributes`, ensure the commit correctly converts files to LFS pointers. `git add --renormalize .` after updating `.gitattributes` and *before* committing is often necessary.
109
  * Double-check `.gitignore` if expected files or directories are missing after a `git add .`.
 
 
 
 
 
 
 
 
app_database_prep.py CHANGED
@@ -228,6 +228,11 @@ africa_splitted_documents = process_excel_sheet(
228
  splitted_documents = pdf_splitted_documents + india_splitted_documents + africa_splitted_documents
229
 
230
 
 
 
 
 
 
231
  # print(splitted_documents[0]) # Original print statement - commented out as we print chunks above
232
  print("=== Combined Processing Done ===") # Adjusted print statement
233
  print(f"Total documents after combining PDF, India, and Africa sources: {len(splitted_documents)}")
 
228
  splitted_documents = pdf_splitted_documents + india_splitted_documents + africa_splitted_documents
229
 
230
 
231
+ print("pdf_splitted_documents", len(pdf_splitted_documents))
232
+ print("india_splitted_documents", len(india_splitted_documents))
233
+ print("africa_splitted_documents", len(africa_splitted_documents))
234
+
235
+
236
  # print(splitted_documents[0]) # Original print statement - commented out as we print chunks above
237
  print("=== Combined Processing Done ===") # Adjusted print statement
238
  print(f"Total documents after combining PDF, India, and Africa sources: {len(splitted_documents)}")
retrieval_evaluation_results.json CHANGED
@@ -1,45 +1,45 @@
1
  {
2
  "no_filter": {
3
  "precision@1": {
4
- "mean": 0.61,
5
- "std": 0.4877499359302879,
6
  "count": 100
7
  },
8
  "precision@3": {
9
- "mean": 0.82,
10
- "std": 0.38418745424597095,
11
  "count": 100
12
  },
13
  "precision@5": {
14
- "mean": 0.84,
15
- "std": 0.36660605559646725,
16
  "count": 100
17
  },
18
  "ndcg@1": {
19
- "mean": 0.61,
20
- "std": 0.4877499359302879,
21
  "count": 100
22
  },
23
  "ndcg@3": {
24
- "mean": 0.7359487605714332,
25
- "std": 0.38022493138147806,
26
  "count": 100
27
  },
28
  "ndcg@5": {
29
- "mean": 0.7441240542245126,
30
- "std": 0.3685408287782305,
31
  "count": 100
32
  }
33
  },
34
  "species_only": {
35
  "precision@1": {
36
- "mean": 0.71,
37
- "std": 0.4537620521815371,
38
  "count": 100
39
  },
40
  "precision@3": {
41
- "mean": 0.97,
42
- "std": 0.17058722109231983,
43
  "count": 100
44
  },
45
  "precision@5": {
@@ -48,57 +48,57 @@
48
  "count": 100
49
  },
50
  "ndcg@1": {
51
- "mean": 0.71,
52
- "std": 0.4537620521815371,
53
  "count": 100
54
  },
55
  "ndcg@3": {
56
- "mean": 0.8661859507142915,
57
- "std": 0.23310162928115066,
58
  "count": 100
59
  },
60
  "ndcg@5": {
61
- "mean": 0.8739230068589822,
62
- "std": 0.2094424760171824,
63
  "count": 100
64
  }
65
  },
66
  "region_only": {
67
  "precision@1": {
68
- "mean": 0.62,
69
- "std": 0.48538644398046393,
70
  "count": 100
71
  },
72
  "precision@3": {
73
- "mean": 0.83,
74
- "std": 0.375632799419859,
75
  "count": 100
76
  },
77
  "precision@5": {
78
- "mean": 0.86,
79
- "std": 0.34698703145794946,
80
  "count": 100
81
  },
82
  "ndcg@1": {
83
- "mean": 0.62,
84
- "std": 0.48538644398046393,
85
  "count": 100
86
  },
87
  "ndcg@3": {
88
- "mean": 0.7459487605714332,
89
- "std": 0.373834218916114,
90
  "count": 100
91
  },
92
  "ndcg@5": {
93
- "mean": 0.7584308198052464,
94
- "std": 0.3552188974398061,
95
  "count": 100
96
  }
97
  },
98
  "species_and_region": {
99
  "precision@1": {
100
- "mean": 0.72,
101
- "std": 0.4489988864128729,
102
  "count": 100
103
  },
104
  "precision@3": {
@@ -107,23 +107,23 @@
107
  "count": 100
108
  },
109
  "precision@5": {
110
- "mean": 0.99,
111
- "std": 0.09949874371066199,
112
  "count": 100
113
  },
114
  "ndcg@1": {
115
- "mean": 0.72,
116
- "std": 0.4489988864128729,
117
  "count": 100
118
  },
119
  "ndcg@3": {
120
- "mean": 0.877495248250006,
121
- "std": 0.21470277973614038,
122
  "count": 100
123
  },
124
  "ndcg@5": {
125
- "mean": 0.8813637763223514,
126
- "std": 0.20196444998865976,
127
  "count": 100
128
  }
129
  }
 
1
  {
2
  "no_filter": {
3
  "precision@1": {
4
+ "mean": 0.59,
5
+ "std": 0.49183330509431744,
6
  "count": 100
7
  },
8
  "precision@3": {
9
+ "mean": 0.76,
10
+ "std": 0.42708313008125254,
11
  "count": 100
12
  },
13
  "precision@5": {
14
+ "mean": 0.82,
15
+ "std": 0.38418745424597095,
16
  "count": 100
17
  },
18
  "ndcg@1": {
19
+ "mean": 0.59,
20
+ "std": 0.49183330509431744,
21
  "count": 100
22
  },
23
  "ndcg@3": {
24
+ "mean": 0.6946394630357184,
25
+ "std": 0.41495405707705707,
26
  "count": 100
27
  },
28
  "ndcg@5": {
29
+ "mean": 0.7182888689781796,
30
+ "std": 0.3848500116841757,
31
  "count": 100
32
  }
33
  },
34
  "species_only": {
35
  "precision@1": {
36
+ "mean": 0.73,
37
+ "std": 0.4439594576084623,
38
  "count": 100
39
  },
40
  "precision@3": {
41
+ "mean": 0.98,
42
+ "std": 0.13999999999999999,
43
  "count": 100
44
  },
45
  "precision@5": {
 
48
  "count": 100
49
  },
50
  "ndcg@1": {
51
+ "mean": 0.73,
52
+ "std": 0.4439594576084623,
53
  "count": 100
54
  },
55
  "ndcg@3": {
56
+ "mean": 0.8811859507142915,
57
+ "std": 0.2136019453378135,
58
  "count": 100
59
  },
60
  "ndcg@5": {
61
+ "mean": 0.8850544787866369,
62
+ "std": 0.2007226726424171,
63
  "count": 100
64
  }
65
  },
66
  "region_only": {
67
  "precision@1": {
68
+ "mean": 0.61,
69
+ "std": 0.4877499359302879,
70
  "count": 100
71
  },
72
  "precision@3": {
73
+ "mean": 0.77,
74
+ "std": 0.4208325082500163,
75
  "count": 100
76
  },
77
  "precision@5": {
78
+ "mean": 0.83,
79
+ "std": 0.375632799419859,
80
  "count": 100
81
  },
82
  "ndcg@1": {
83
+ "mean": 0.61,
84
+ "std": 0.4877499359302879,
85
  "count": 100
86
  },
87
  "ndcg@3": {
88
+ "mean": 0.7083301655000039,
89
+ "std": 0.4110942789611411,
90
  "count": 100
91
  },
92
  "ndcg@5": {
93
+ "mean": 0.7319795714424648,
94
+ "std": 0.37983366654728035,
95
  "count": 100
96
  }
97
  },
98
  "species_and_region": {
99
  "precision@1": {
100
+ "mean": 0.75,
101
+ "std": 0.4330127018922193,
102
  "count": 100
103
  },
104
  "precision@3": {
 
107
  "count": 100
108
  },
109
  "precision@5": {
110
+ "mean": 1.0,
111
+ "std": 0.0,
112
  "count": 100
113
  },
114
  "ndcg@1": {
115
+ "mean": 0.75,
116
+ "std": 0.4330127018922193,
117
  "count": 100
118
  },
119
  "ndcg@3": {
120
+ "mean": 0.8898766531785769,
121
+ "std": 0.2091728695998248,
122
  "count": 100
123
  },
124
  "ndcg@5": {
125
+ "mean": 0.8980519468316562,
126
+ "std": 0.18024378480878206,
127
  "count": 100
128
  }
129
  }
species-organized/PestID Species - Organized.xlsx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb0ceab28dfca471472fcc9e4631100826dc5ea2d3fdf2b78c215b859938eb61
3
+ size 27466
species-organized/species_analysis.png ADDED

Git LFS Details

  • SHA256: 031382b328cd928ae91de47e7c205cffc761f220cc2c273731e6be1631c893d5
  • Pointer size: 131 Bytes
  • Size of remote file: 427 kB
species-organized/species_analysis.py ADDED
@@ -0,0 +1,513 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Species Analysis Script
3
+ =======================
4
+ Analyzes pest species data from PestID Species - Organized.xlsx
5
+ Generates: LaTeX table, multi-panel visualization, and statistics summary
6
+
7
+ Author: AgLLM Project
8
+ Date: 2025-10-26
9
+ """
10
+
11
+ import pandas as pd
12
+ import numpy as np
13
+ import matplotlib.pyplot as plt
14
+ from matplotlib.gridspec import GridSpec
15
+ from matplotlib.patches import Circle
16
+ from matplotlib.patches import Patch
17
+ from pathlib import Path
18
+
19
+ # Set font parameters (matching reference style)
20
+ plt.rcParams.update({
21
+ 'font.family': 'Arial',
22
+ 'font.size': 14,
23
+ 'axes.labelsize': 14,
24
+ 'axes.titlesize': 15,
25
+ 'xtick.labelsize': 12,
26
+ 'ytick.labelsize': 12,
27
+ 'legend.fontsize': 13,
28
+ })
29
+
30
+ # Define color palette
31
+ COLORS = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#6A994E', '#BC4B51', '#5B8E7D', '#F4A259']
32
+ REGION_COLORS = {
33
+ 'US': '#2E86AB',
34
+ 'Africa': '#F18F01',
35
+ 'India': '#6A994E'
36
+ }
37
+
38
+ # File paths
39
+ SCRIPT_DIR = Path(__file__).parent
40
+ DATA_FILE = SCRIPT_DIR / 'PestID Species - Organized.xlsx'
41
+ OUTPUT_TABLE = SCRIPT_DIR / 'species_table.tex'
42
+ OUTPUT_PLOT_PDF = SCRIPT_DIR / 'species_analysis.pdf'
43
+ OUTPUT_PLOT_PNG = SCRIPT_DIR / 'species_analysis.png'
44
+ OUTPUT_STATS = SCRIPT_DIR / 'species_statistics.txt'
45
+
46
+
47
+ def load_and_prepare_data():
48
+ """Load data from all sheets and create consolidated DataFrame"""
49
+ print("Loading data from Excel file...")
50
+
51
+ # Read all sheets
52
+ us_df = pd.read_excel(DATA_FILE, sheet_name='US')
53
+ africa_df = pd.read_excel(DATA_FILE, sheet_name='Africa')
54
+ india_df = pd.read_excel(DATA_FILE, sheet_name='India')
55
+
56
+ # Add region column
57
+ us_df['Region'] = 'US'
58
+ africa_df['Region'] = 'Africa'
59
+ india_df['Region'] = 'India'
60
+
61
+ # Standardize column names across sheets
62
+ # US: Species, Common Name, Tag, Accuracy
63
+ # Africa: Common Name, Species, Tag, Accuracy, IPM Info, Excluded Link
64
+ # India: Common Name, Species, Tag, Accuracy, IPM Info
65
+
66
+ # Reorder columns for US to match others
67
+ us_df = us_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy']]
68
+ us_df['IPM Info'] = None
69
+ us_df['Excluded Link'] = None
70
+
71
+ # Reorder columns for Africa
72
+ africa_df = africa_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'IPM Info', 'Excluded Link']]
73
+
74
+ # Reorder columns for India (no Excluded Link)
75
+ india_df = india_df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'IPM Info']]
76
+ india_df['Excluded Link'] = None
77
+
78
+ # FIX: India uses decimal accuracy (0.59-0.95) instead of percentage
79
+ # Multiply India accuracy by 100
80
+ india_df['Accuracy'] = india_df['Accuracy'] * 100
81
+
82
+ # Concatenate all DataFrames
83
+ df_consolidated = pd.concat([us_df, africa_df, india_df], ignore_index=True)
84
+
85
+ # Add helper column for IPM availability
86
+ df_consolidated['Has_IPM'] = df_consolidated['IPM Info'].notna()
87
+
88
+ print(f"βœ“ Loaded {len(df_consolidated)} species from 3 regions")
89
+ print(f" - US: {len(us_df)} species")
90
+ print(f" - Africa: {len(africa_df)} species")
91
+ print(f" - India: {len(india_df)} species")
92
+
93
+ return df_consolidated
94
+
95
+
96
+ def generate_latex_table(df):
97
+ """Generate LaTeX table with all species data"""
98
+ print("\nGenerating LaTeX table...")
99
+
100
+ # Create simplified table for LaTeX (without full IPM text)
101
+ df_table = df[['Region', 'Species', 'Common Name', 'Tag', 'Accuracy', 'Has_IPM']].copy()
102
+
103
+ # Format accuracy
104
+ df_table['Accuracy'] = df_table['Accuracy'].apply(lambda x: f"{x:.1f}" if pd.notna(x) else "β€”")
105
+
106
+ # Format Has_IPM as Yes/No
107
+ df_table['Has_IPM'] = df_table['Has_IPM'].apply(lambda x: "Yes" if x else "No")
108
+
109
+ # Replace NaN in Tag
110
+ df_table['Tag'] = df_table['Tag'].fillna("β€”")
111
+
112
+ # Sort by Region, then Tag, then Accuracy
113
+ df_table = df_table.sort_values(['Region', 'Tag', 'Accuracy'], ascending=[True, True, False])
114
+
115
+ # Start building LaTeX code
116
+ latex_code = []
117
+ latex_code.append("% LaTeX Table: Species Analysis")
118
+ latex_code.append("% Requires: \\usepackage{booktabs, longtable}")
119
+ latex_code.append("")
120
+ latex_code.append("\\begin{longtable}{llllrr}")
121
+ latex_code.append("\\toprule")
122
+ latex_code.append("\\textbf{Region} & \\textbf{Species} & \\textbf{Common Name} & \\textbf{Tag} & \\textbf{Accuracy (\\%)} & \\textbf{IPM Info} \\\\")
123
+ latex_code.append("\\midrule")
124
+ latex_code.append("\\endfirsthead")
125
+ latex_code.append("")
126
+ latex_code.append("\\multicolumn{6}{c}")
127
+ latex_code.append("{\\tablename\\ \\thetable\\ -- \\textit{Continued from previous page}} \\\\")
128
+ latex_code.append("\\toprule")
129
+ latex_code.append("\\textbf{Region} & \\textbf{Species} & \\textbf{Common Name} & \\textbf{Tag} & \\textbf{Accuracy (\\%)} & \\textbf{IPM Info} \\\\")
130
+ latex_code.append("\\midrule")
131
+ latex_code.append("\\endhead")
132
+ latex_code.append("")
133
+ latex_code.append("\\midrule")
134
+ latex_code.append("\\multicolumn{6}{r}{\\textit{Continued on next page}} \\\\")
135
+ latex_code.append("\\endfoot")
136
+ latex_code.append("")
137
+ latex_code.append("\\bottomrule")
138
+ latex_code.append("\\endlastfoot")
139
+ latex_code.append("")
140
+
141
+ # Add data rows
142
+ for idx, row in df_table.iterrows():
143
+ # Escape special LaTeX characters
144
+ species = str(row['Species']).replace('_', '\\_').replace('&', '\\&')
145
+ common_name = str(row['Common Name']).replace('_', '\\_').replace('&', '\\&')
146
+
147
+ latex_code.append(f"{row['Region']} & \\textit{{{species}}} & {common_name} & {row['Tag']} & {row['Accuracy']} & {row['Has_IPM']} \\\\")
148
+
149
+ latex_code.append("")
150
+ latex_code.append("\\end{longtable}")
151
+
152
+ # Write to file
153
+ with open(OUTPUT_TABLE, 'w') as f:
154
+ f.write('\n'.join(latex_code))
155
+
156
+ print(f"βœ“ LaTeX table saved to: {OUTPUT_TABLE}")
157
+ print(f" Contains {len(df_table)} species")
158
+
159
+
160
+ def create_visualization(df):
161
+ """Create comprehensive multi-panel visualization"""
162
+ print("\nCreating visualization...")
163
+
164
+ # Create figure with GridSpec layout (2 rows Γ— 2 columns)
165
+ fig = plt.figure(figsize=(14, 10))
166
+ gs = GridSpec(2, 2, figure=fig, hspace=0.35, wspace=0.35)
167
+
168
+ # 1. Species Count by Region (Top Left) - with Insect/Weed breakdown
169
+ ax1 = fig.add_subplot(gs[0, 0])
170
+
171
+ # Get insect and weed counts by region
172
+ tag_by_region = pd.crosstab(df['Region'], df['Tag'])
173
+ tag_by_region = tag_by_region.reindex(['US', 'Africa', 'India'])
174
+
175
+ insects = tag_by_region['insect'].values if 'insect' in tag_by_region.columns else [0, 0, 0]
176
+ weeds = tag_by_region['weed'].values if 'weed' in tag_by_region.columns else [0, 0, 0]
177
+
178
+ x = range(len(tag_by_region))
179
+
180
+ # Create stacked bars
181
+ bars1_insects = ax1.bar(x, insects, label='Insect', color=COLORS[0], alpha=0.8, edgecolor='black', linewidth=0.5)
182
+ bars1_weeds = ax1.bar(x, weeds, bottom=insects, label='Weed', color=COLORS[2], alpha=0.8, edgecolor='black', linewidth=0.5)
183
+
184
+ ax1.set_xticks(x)
185
+ ax1.set_xticklabels(tag_by_region.index)
186
+ ax1.set_ylabel('Number of Species')
187
+ ax1.set_title('Species Count by Region')
188
+ ax1.legend(loc='upper right', fontsize=11)
189
+
190
+ # Add total count labels on top
191
+ for i, region in enumerate(tag_by_region.index):
192
+ total = insects[i] + weeds[i]
193
+ ax1.text(i, total + 2, str(int(total)), ha='center', va='bottom',
194
+ fontsize=12, fontweight='bold')
195
+
196
+ # 2. Accuracy Distribution by Region - Box Plot (Top Right)
197
+ ax2 = fig.add_subplot(gs[0, 1])
198
+
199
+ # Prepare data for box plot
200
+ accuracy_data = []
201
+ labels = []
202
+ colors_box = []
203
+ for region in ['US', 'Africa', 'India']:
204
+ region_acc = df[df['Region'] == region]['Accuracy'].dropna()
205
+ if len(region_acc) > 0:
206
+ accuracy_data.append(region_acc)
207
+ labels.append(region)
208
+ colors_box.append(REGION_COLORS[region])
209
+
210
+ bp = ax2.boxplot(accuracy_data, tick_labels=labels, patch_artist=True,
211
+ medianprops=dict(color='red', linewidth=2),
212
+ whiskerprops=dict(linewidth=1.5),
213
+ boxprops=dict(linewidth=1.5),
214
+ showfliers=True)
215
+
216
+ # Color the boxes
217
+ for patch, color in zip(bp['boxes'], colors_box):
218
+ patch.set_facecolor(color)
219
+ patch.set_alpha(0.7)
220
+
221
+ ax2.set_ylabel('Accuracy (%)')
222
+ ax2.set_title('Accuracy Distribution by Region')
223
+ ax2.grid(True, alpha=0.3, axis='y')
224
+ ax2.set_ylim(35, 105)
225
+
226
+ # 3. Species Overlap - Venn Diagram (Bottom Left)
227
+ ax3 = fig.add_subplot(gs[1, 0])
228
+
229
+ # Calculate species overlap
230
+ us_species = set(df[df['Region'] == 'US']['Species'].str.lower().str.strip())
231
+ africa_species = set(df[df['Region'] == 'Africa']['Species'].str.lower().str.strip())
232
+ india_species = set(df[df['Region'] == 'India']['Species'].str.lower().str.strip())
233
+
234
+ us_only = len(us_species - africa_species - india_species)
235
+ africa_only = len(africa_species - us_species - india_species)
236
+ india_only = len(india_species - us_species - africa_species)
237
+ us_africa = len((us_species & africa_species) - india_species)
238
+ us_india = len((us_species & india_species) - africa_species)
239
+ africa_india = len((africa_species & india_species) - us_species)
240
+ all_three = len(us_species & africa_species & india_species)
241
+
242
+ # Create professional 3-circle Venn diagram
243
+ ax3.set_xlim(0, 4)
244
+ ax3.set_ylim(-0.5, 3.5) # Extended lower bound to prevent cutoff
245
+ ax3.set_aspect('equal')
246
+ ax3.axis('off')
247
+
248
+ # Circle parameters for proper overlap
249
+ radius = 1.0
250
+ # Positions chosen to create good overlaps
251
+ circle_us = Circle((1.2, 1.8), radius, color=REGION_COLORS['US'], alpha=0.4,
252
+ linewidth=2, edgecolor=REGION_COLORS['US'], fill=True)
253
+ circle_africa = Circle((2.8, 1.8), radius, color=REGION_COLORS['Africa'], alpha=0.4,
254
+ linewidth=2, edgecolor=REGION_COLORS['Africa'], fill=True)
255
+ circle_india = Circle((2.0, 0.7), radius, color=REGION_COLORS['India'], alpha=0.4,
256
+ linewidth=2, edgecolor=REGION_COLORS['India'], fill=True)
257
+
258
+ ax3.add_patch(circle_us)
259
+ ax3.add_patch(circle_africa)
260
+ ax3.add_patch(circle_india)
261
+
262
+ # Add labels for regions (outside circles, avoiding overlap)
263
+ ax3.text(0.4, 2.9, 'US', fontsize=13, fontweight='bold', color=REGION_COLORS['US'], ha='center')
264
+ ax3.text(3.6, 2.9, 'Africa', fontsize=13, fontweight='bold', color=REGION_COLORS['Africa'], ha='center')
265
+ ax3.text(2.0, -0.45, 'India', fontsize=13, fontweight='bold', color=REGION_COLORS['India'], ha='center')
266
+
267
+ # Add counts in appropriate regions
268
+ # US only (left)
269
+ ax3.text(0.7, 1.8, str(us_only), fontsize=14, fontweight='bold', ha='center', va='center')
270
+
271
+ # Africa only (right)
272
+ ax3.text(3.3, 1.8, str(africa_only), fontsize=14, fontweight='bold', ha='center', va='center')
273
+
274
+ # India only (bottom)
275
+ ax3.text(2.0, 0.3, str(india_only), fontsize=14, fontweight='bold', ha='center', va='center')
276
+
277
+ # US & Africa (top middle)
278
+ ax3.text(2.0, 2.1, str(us_africa), fontsize=13, fontweight='bold', ha='center', va='center',
279
+ bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
280
+
281
+ # US & India (left-bottom)
282
+ ax3.text(1.4, 1.0, str(us_india), fontsize=13, fontweight='bold', ha='center', va='center',
283
+ bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
284
+
285
+ # Africa & India (right-bottom)
286
+ ax3.text(2.6, 1.0, str(africa_india), fontsize=13, fontweight='bold', ha='center', va='center',
287
+ bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))
288
+
289
+ # All three (center)
290
+ ax3.text(2.0, 1.4, str(all_three), fontsize=13, fontweight='bold', ha='center', va='center',
291
+ bbox=dict(boxstyle='round,pad=0.3', facecolor='yellow', alpha=0.6))
292
+
293
+ ax3.set_title('Species Overlap Across Regions', fontsize=15, pad=10)
294
+
295
+ # 4. Accuracy Range Distribution (Bottom Right)
296
+ ax4 = fig.add_subplot(gs[1, 1])
297
+
298
+ # Create accuracy bins
299
+ df_acc = df.dropna(subset=['Accuracy']).copy()
300
+ bins = [0, 50, 70, 85, 100]
301
+ labels_bins = ['0-50%', '50-70%', '70-85%', '85-100%']
302
+ df_acc['Accuracy_Range'] = pd.cut(df_acc['Accuracy'], bins=bins, labels=labels_bins, include_lowest=True)
303
+
304
+ range_counts = df_acc['Accuracy_Range'].value_counts().reindex(labels_bins)
305
+
306
+ bars4 = ax4.bar(range(len(range_counts)), range_counts.values,
307
+ color=COLORS[3], alpha=0.8, edgecolor='black', linewidth=1)
308
+ ax4.set_xticks(range(len(range_counts)))
309
+ ax4.set_xticklabels(range_counts.index, rotation=0)
310
+ ax4.set_xlabel('Accuracy Range')
311
+ ax4.set_ylabel('Number of Species')
312
+ ax4.set_title('Species by Accuracy Range')
313
+
314
+ # Add value labels
315
+ for i, val in enumerate(range_counts.values):
316
+ if pd.notna(val) and val > 0:
317
+ ax4.text(i, val + 1.5, str(int(val)),
318
+ ha='center', va='bottom', fontsize=11, fontweight='bold')
319
+
320
+ # Add overall title
321
+ plt.suptitle('PestID Bot Knowledgebank Features', fontsize=18, fontweight='bold', y=0.98)
322
+
323
+ # Save figures
324
+ plt.savefig(OUTPUT_PLOT_PDF, dpi=300, bbox_inches='tight')
325
+ plt.savefig(OUTPUT_PLOT_PNG, dpi=300, bbox_inches='tight')
326
+
327
+ print(f"βœ“ Visualization saved:")
328
+ print(f" - {OUTPUT_PLOT_PDF}")
329
+ print(f" - {OUTPUT_PLOT_PNG}")
330
+
331
+
332
+ def generate_statistics(df):
333
+ """Generate comprehensive statistics summary"""
334
+ print("\nGenerating statistics summary...")
335
+
336
+ stats = []
337
+ stats.append("=" * 80)
338
+ stats.append("PEST SPECIES ANALYSIS - STATISTICS SUMMARY")
339
+ stats.append("=" * 80)
340
+ stats.append("")
341
+
342
+ # 1. Overall counts
343
+ stats.append("1. OVERALL SPECIES COUNTS")
344
+ stats.append("-" * 40)
345
+ stats.append(f"Total species: {len(df)}")
346
+ stats.append("")
347
+ stats.append("By Region:")
348
+ for region in ['US', 'Africa', 'India']:
349
+ count = len(df[df['Region'] == region])
350
+ percentage = (count / len(df)) * 100
351
+ stats.append(f" {region:10s}: {count:3d} species ({percentage:5.1f}%)")
352
+ stats.append("")
353
+
354
+ # 2. Tag distribution
355
+ stats.append("2. INSECT VS WEED DISTRIBUTION")
356
+ stats.append("-" * 40)
357
+
358
+ total_insects = len(df[df['Tag'] == 'insect'])
359
+ total_weeds = len(df[df['Tag'] == 'weed'])
360
+ stats.append(f"Overall:")
361
+ stats.append(f" Insects: {total_insects} ({total_insects/len(df)*100:.1f}%)")
362
+ stats.append(f" Weeds: {total_weeds} ({total_weeds/len(df)*100:.1f}%)")
363
+ stats.append("")
364
+
365
+ stats.append("By Region:")
366
+ for region in ['US', 'Africa', 'India']:
367
+ region_df = df[df['Region'] == region]
368
+ insects = len(region_df[region_df['Tag'] == 'insect'])
369
+ weeds = len(region_df[region_df['Tag'] == 'weed'])
370
+ stats.append(f" {region}:")
371
+ stats.append(f" Insects: {insects}")
372
+ stats.append(f" Weeds: {weeds}")
373
+ stats.append("")
374
+
375
+ # 3. Accuracy statistics
376
+ stats.append("3. ACCURACY STATISTICS")
377
+ stats.append("-" * 40)
378
+
379
+ for region in ['US', 'Africa', 'India']:
380
+ region_df = df[df['Region'] == region]
381
+ acc = region_df['Accuracy'].dropna()
382
+
383
+ stats.append(f"{region}:")
384
+ if len(acc) > 0:
385
+ stats.append(f" Mean: {acc.mean():6.2f}%")
386
+ stats.append(f" Median: {acc.median():6.2f}%")
387
+ stats.append(f" Std Dev: {acc.std():6.2f}%")
388
+ stats.append(f" Min: {acc.min():6.2f}%")
389
+ stats.append(f" Max: {acc.max():6.2f}%")
390
+ missing = region_df['Accuracy'].isna().sum()
391
+ stats.append(f" Missing: {missing} ({missing/len(region_df)*100:.1f}%)")
392
+ else:
393
+ stats.append(f" No accuracy data")
394
+ stats.append("")
395
+
396
+ # Overall accuracy
397
+ all_acc = df['Accuracy'].dropna()
398
+ stats.append("Overall (all regions):")
399
+ stats.append(f" Mean: {all_acc.mean():6.2f}%")
400
+ stats.append(f" Median: {all_acc.median():6.2f}%")
401
+ stats.append(f" Std Dev: {all_acc.std():6.2f}%")
402
+ stats.append(f" Range: {all_acc.min():.2f}% - {all_acc.max():.2f}%")
403
+ stats.append("")
404
+
405
+ # 4. IPM Info coverage
406
+ stats.append("4. IPM INFORMATION COVERAGE")
407
+ stats.append("-" * 40)
408
+
409
+ for region in ['US', 'Africa', 'India']:
410
+ region_df = df[df['Region'] == region]
411
+ with_ipm = region_df['Has_IPM'].sum()
412
+ total = len(region_df)
413
+ percentage = (with_ipm / total) * 100
414
+ stats.append(f"{region:10s}: {with_ipm:2d}/{total:2d} species ({percentage:5.1f}%)")
415
+
416
+ total_ipm = df['Has_IPM'].sum()
417
+ stats.append(f"{'Overall':10s}: {total_ipm:2d}/{len(df):2d} species ({total_ipm/len(df)*100:5.1f}%)")
418
+ stats.append("")
419
+
420
+ # 5. Top species by accuracy
421
+ stats.append("5. TOP 10 SPECIES BY ACCURACY")
422
+ stats.append("-" * 40)
423
+
424
+ top_10 = df.dropna(subset=['Accuracy']).nlargest(10, 'Accuracy')
425
+ for i, (idx, row) in enumerate(top_10.iterrows(), 1):
426
+ stats.append(f"{i:2d}. {row['Common Name']:30s} ({row['Species']:25s}) - {row['Accuracy']:5.1f}% [{row['Region']}]")
427
+ stats.append("")
428
+
429
+ # 6. Species with lowest accuracy
430
+ stats.append("6. BOTTOM 10 SPECIES BY ACCURACY")
431
+ stats.append("-" * 40)
432
+
433
+ bottom_10 = df.dropna(subset=['Accuracy']).nsmallest(10, 'Accuracy')
434
+ for i, (idx, row) in enumerate(bottom_10.iterrows(), 1):
435
+ stats.append(f"{i:2d}. {row['Common Name']:30s} ({row['Species']:25s}) - {row['Accuracy']:5.1f}% [{row['Region']}]")
436
+ stats.append("")
437
+
438
+ # 7. Species overlap analysis
439
+ stats.append("7. SPECIES OVERLAP ACROSS REGIONS")
440
+ stats.append("-" * 40)
441
+
442
+ us_species = set(df[df['Region'] == 'US']['Species'].str.lower().str.strip())
443
+ africa_species = set(df[df['Region'] == 'Africa']['Species'].str.lower().str.strip())
444
+ india_species = set(df[df['Region'] == 'India']['Species'].str.lower().str.strip())
445
+
446
+ overlap_us_africa = us_species & africa_species
447
+ overlap_us_india = us_species & india_species
448
+ overlap_africa_india = africa_species & india_species
449
+ all_three = us_species & africa_species & india_species
450
+
451
+ stats.append(f"US & Africa: {len(overlap_us_africa)} species")
452
+ if len(overlap_us_africa) > 0:
453
+ for species in sorted(overlap_us_africa):
454
+ stats.append(f" - {species}")
455
+
456
+ stats.append(f"\nUS & India: {len(overlap_us_india)} species")
457
+ if len(overlap_us_india) > 0:
458
+ for species in sorted(overlap_us_india):
459
+ stats.append(f" - {species}")
460
+
461
+ stats.append(f"\nAfrica & India: {len(overlap_africa_india)} species")
462
+ if len(overlap_africa_india) > 0:
463
+ for species in sorted(overlap_africa_india):
464
+ stats.append(f" - {species}")
465
+
466
+ stats.append(f"\nAll three regions: {len(all_three)} species")
467
+ if len(all_three) > 0:
468
+ for species in sorted(all_three):
469
+ stats.append(f" - {species}")
470
+
471
+ stats.append("")
472
+ stats.append("=" * 80)
473
+ stats.append(f"Analysis completed on: 2025-10-26")
474
+ stats.append("=" * 80)
475
+
476
+ # Write to file
477
+ with open(OUTPUT_STATS, 'w') as f:
478
+ f.write('\n'.join(stats))
479
+
480
+ print(f"βœ“ Statistics saved to: {OUTPUT_STATS}")
481
+
482
+ # Also print to console
483
+ print("\n" + '\n'.join(stats[:50])) # Print first 50 lines to console
484
+
485
+
486
+ def main():
487
+ """Main execution function"""
488
+ print("=" * 80)
489
+ print("PEST SPECIES ANALYSIS")
490
+ print("=" * 80)
491
+ print()
492
+
493
+ # Load and prepare data
494
+ df = load_and_prepare_data()
495
+
496
+ # Generate outputs
497
+ generate_latex_table(df)
498
+ create_visualization(df)
499
+ generate_statistics(df)
500
+
501
+ print("\n" + "=" * 80)
502
+ print("ANALYSIS COMPLETE!")
503
+ print("=" * 80)
504
+ print("\nGenerated files:")
505
+ print(f" 1. {OUTPUT_TABLE.name} - LaTeX table")
506
+ print(f" 2. {OUTPUT_PLOT_PDF.name} - Visualization (PDF)")
507
+ print(f" 3. {OUTPUT_PLOT_PNG.name} - Visualization (PNG)")
508
+ print(f" 4. {OUTPUT_STATS.name} - Statistics summary")
509
+ print()
510
+
511
+
512
+ if __name__ == '__main__':
513
+ main()
species-organized/species_statistics.txt ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ================================================================================
2
+ PEST SPECIES ANALYSIS - STATISTICS SUMMARY
3
+ ================================================================================
4
+
5
+ 1. OVERALL SPECIES COUNTS
6
+ ----------------------------------------
7
+ Total species: 126
8
+
9
+ By Region:
10
+ US : 80 species ( 63.5%)
11
+ Africa : 35 species ( 27.8%)
12
+ India : 11 species ( 8.7%)
13
+
14
+ 2. INSECT VS WEED DISTRIBUTION
15
+ ----------------------------------------
16
+ Overall:
17
+ Insects: 65 (51.6%)
18
+ Weeds: 59 (46.8%)
19
+
20
+ By Region:
21
+ US:
22
+ Insects: 44
23
+ Weeds: 36
24
+ Africa:
25
+ Insects: 10
26
+ Weeds: 23
27
+ India:
28
+ Insects: 11
29
+ Weeds: 0
30
+
31
+ 3. ACCURACY STATISTICS
32
+ ----------------------------------------
33
+ US:
34
+ Mean: 89.69%
35
+ Median: 91.00%
36
+ Std Dev: 11.74%
37
+ Min: 40.00%
38
+ Max: 100.00%
39
+ Missing: 19 (23.8%)
40
+
41
+ Africa:
42
+ Mean: 89.81%
43
+ Median: 95.00%
44
+ Std Dev: 10.83%
45
+ Min: 59.00%
46
+ Max: 100.00%
47
+ Missing: 8 (22.9%)
48
+
49
+ India:
50
+ Mean: 80.00%
51
+ Median: 83.00%
52
+ Std Dev: 10.69%
53
+ Min: 59.00%
54
+ Max: 95.00%
55
+ Missing: 1 (9.1%)
56
+
57
+ Overall (all regions):
58
+ Mean: 88.73%
59
+ Median: 90.00%
60
+ Std Dev: 11.67%
61
+ Range: 40.00% - 100.00%
62
+
63
+ 4. IPM INFORMATION COVERAGE
64
+ ----------------------------------------
65
+ US : 0/80 species ( 0.0%)
66
+ Africa : 35/35 species (100.0%)
67
+ India : 10/11 species ( 90.9%)
68
+ Overall : 45/126 species ( 35.7%)
69
+
70
+ 5. TOP 10 SPECIES BY ACCURACY
71
+ ----------------------------------------
72
+ 1. Seedcorn beetle (stenolophus lecontei ) - 100.0% [US]
73
+ 2. Seedcorn maggot (delia platura ) - 100.0% [US]
74
+ 3. Hop Vine Borer (hydraecia immanis ) - 100.0% [US]
75
+ 4. Barnyardgrass (echinochloa crus-galli ) - 100.0% [US]
76
+ 5. common Cocklebur (xanthium strumarium ) - 100.0% [US]
77
+ 6. common Lambsquarters (chenopodium album ) - 100.0% [US]
78
+ 7. CommonWaterhemp (amaranthus tuberculatus ) - 100.0% [US]
79
+ 8. Gaint ragweed (ambrosia trifida ) - 100.0% [US]
80
+ 9. Henbit (deadnettle) (lamium amplexicaule ) - 100.0% [US]
81
+ 10. Jimsonweed (datura stramonium ) - 100.0% [US]
82
+
83
+ 6. BOTTOM 10 SPECIES BY ACCURACY
84
+ ----------------------------------------
85
+ 1. Annual ryegrass (lolium multiflorum ) - 40.0% [US]
86
+ 2. Spotted fireworm (choristoneura parallela ) - 44.0% [US]
87
+ 3. Cowpea aphid (Aphis craccivora ) - 59.0% [Africa]
88
+ 4. Cowpea aphid (Aphis craccivora ) - 59.0% [India]
89
+ 5. Spiraea Aphid (Aphis spiraecola ) - 67.0% [Africa]
90
+ 6. Spiraea Aphid (Aphis spiraecola ) - 67.0% [India]
91
+ 7. alfalfa weevil (hypera postica ) - 73.0% [US]
92
+ 8. twospotted spider mite (tetranychus urticae ) - 73.0% [US]
93
+ 9. Corn ear borer (Helicoverpa armigera ) - 74.0% [Africa]
94
+ 10. Corn ear borer (Helicoverpa armigera ) - 74.0% [India]
95
+
96
+ 7. SPECIES OVERLAP ACROSS REGIONS
97
+ ----------------------------------------
98
+ US & Africa: 2 species
99
+ - amaranthus tuberculatus
100
+ - cyperus esculentus
101
+
102
+ US & India: 1 species
103
+ - spodoptera frugiperda
104
+
105
+ Africa & India: 10 species
106
+ - aphis craccivora
107
+ - aphis spiraecola
108
+ - atherigona reversura
109
+ - drosophila suzukii
110
+ - euborellia annulipes
111
+ - halyomorpha halys
112
+ - helicoverpa armigera
113
+ - icerya purchasi
114
+ - nezara viridula
115
+ - spodoptera litura
116
+
117
+ All three regions: 0 species
118
+
119
+ ================================================================================
120
+ Analysis completed on: 2025-10-26
121
+ ================================================================================
species-organized/species_table.tex ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % LaTeX Table: Species Analysis
2
+ % Requires: \usepackage{booktabs, longtable}
3
+
4
+ \begin{longtable}{llllrr}
5
+ \toprule
6
+ \textbf{Region} & \textbf{Species} & \textbf{Common Name} & \textbf{Tag} & \textbf{Accuracy (\%)} & \textbf{IPM Info} \\
7
+ \midrule
8
+ \endfirsthead
9
+
10
+ \multicolumn{6}{c}
11
+ {\tablename\ \thetable\ -- \textit{Continued from previous page}} \\
12
+ \toprule
13
+ \textbf{Region} & \textbf{Species} & \textbf{Common Name} & \textbf{Tag} & \textbf{Accuracy (\%)} & \textbf{IPM Info} \\
14
+ \midrule
15
+ \endhead
16
+
17
+ \midrule
18
+ \multicolumn{6}{r}{\textit{Continued on next page}} \\
19
+ \endfoot
20
+
21
+ \bottomrule
22
+ \endlastfoot
23
+
24
+ Africa & \textit{Halyomorpha halys} & Brown Marmorated Stink Bug & insect & 95.0 & Yes \\
25
+ Africa & \textit{Nezara viridula} & Green stink bug & insect & 88.0 & Yes \\
26
+ Africa & \textit{Drosophila suzukii} & Spotted-winged Drosophila & insect & 86.0 & Yes \\
27
+ Africa & \textit{Spodoptera litura} & Tobacco caterpillar & insect & 86.0 & Yes \\
28
+ Africa & \textit{Atherigona reversura} & Shoot fly & insect & 84.0 & Yes \\
29
+ Africa & \textit{Icerya purchasi} & Cottony cushion scale & insect & 82.0 & Yes \\
30
+ Africa & \textit{Euborellia annulipes} & Ring-legged Earwig & insect & 79.0 & Yes \\
31
+ Africa & \textit{Helicoverpa armigera} & Corn ear borer & insect & 74.0 & Yes \\
32
+ Africa & \textit{Aphis spiraecola} & Spiraea Aphid & insect & 67.0 & Yes \\
33
+ Africa & \textit{Aphis craccivora} & Cowpea aphid & insect & 59.0 & Yes \\
34
+ Africa & \textit{Trianthema triquetrum} & Red Spinach & weed & β€” & Yes \\
35
+ Africa & \textit{Trianthema portulacastrum} & Desert Horse Purslane & weed & β€” & Yes \\
36
+ Africa & \textit{Cleome rutidosperma} & Purple spider flower & weed & β€” & Yes \\
37
+ Africa & \textit{Cleome gynandra} & Spiderwisp & weed & β€” & Yes \\
38
+ Africa & \textit{Cleome viscosa} & Asian Spider flower & weed & β€” & Yes \\
39
+ Africa & \textit{Cleome spinosa} & Spiny Spider-Flower & weed & β€” & Yes \\
40
+ Africa & \textit{Cleome aculeata} & Prickly Spiderflower & weed & β€” & Yes \\
41
+ Africa & \textit{Cleome monophylla} & Singleleaf Spindlepod & weed & β€” & Yes \\
42
+ Africa & \textit{Amaranthus viridis} & Green Amaranth & weed & 95.0 & Yes \\
43
+ Africa & \textit{Cyperus entrerianus} & Deeproot Sedge & weed & 95.0 & Yes \\
44
+ Africa & \textit{Cyperus esculentus} & Yellow nutsedge & weed & 95.0 & Yes \\
45
+ Africa & \textit{Cyperus haspan} & Haspan flatsedge & weed & 95.0 & Yes \\
46
+ Africa & \textit{Cyperus iria L.} & Rice flatsedge & weed & 90.0 & Yes \\
47
+ Africa & \textit{Cyperus rotundus} & Purple Nutsedge & weed & 90.0 & Yes \\
48
+ Africa & \textit{Medicago minima} & Little Bur-clover & weed & 90.0 & Yes \\
49
+ Africa & \textit{Cyperus prolifer} & Dwarf papyrus & weed & 80.0 & Yes \\
50
+ Africa & \textit{Amaranthus tuberculatus} & Tall waterhemp & weed & 100.0 & Yes \\
51
+ Africa & \textit{Cyperus brevifolius} & Shortleaf flatsedge & weed & 100.0 & Yes \\
52
+ Africa & \textit{Cyperus difformis} & Smallflower umbrella sedge & weed & 100.0 & Yes \\
53
+ Africa & \textit{Cyperus mindorensis} & nan & weed & 100.0 & Yes \\
54
+ Africa & \textit{Cleome houtteana} & Spider flower & weed & 100.0 & Yes \\
55
+ Africa & \textit{Medicago falcata} & Yellow alfalfa & weed & 100.0 & Yes \\
56
+ Africa & \textit{Medicago lupulina} & Black Medick & weed & 100.0 & Yes \\
57
+ Africa & \textit{Medicago polymorpha} & Burr Medic & β€” & 95.0 & Yes \\
58
+ Africa & \textit{Striga asiatica} & Witch weed & β€” & 100.0 & Yes \\
59
+ India & \textit{Spodoptera frugiperda} & Fall Armyworm & insect & β€” & No \\
60
+ India & \textit{Halyomorpha halys} & Brown Marmorated Stink Bug & insect & 95.0 & Yes \\
61
+ India & \textit{Nezara viridula} & Green stink bug & insect & 88.0 & Yes \\
62
+ India & \textit{Drosophila suzukii} & Spotted-winged Drosophila & insect & 86.0 & Yes \\
63
+ India & \textit{Spodoptera litura} & Tobacco caterpillar & insect & 86.0 & Yes \\
64
+ India & \textit{Atherigona reversura} & Shoot fly & insect & 84.0 & Yes \\
65
+ India & \textit{Icerya purchasi} & Cottony cushion scale & insect & 82.0 & Yes \\
66
+ India & \textit{Euborellia annulipes} & Ring-legged Earwig & insect & 79.0 & Yes \\
67
+ India & \textit{Helicoverpa armigera} & Corn ear borer & insect & 74.0 & Yes \\
68
+ India & \textit{Aphis spiraecola} & Spiraea Aphid & insect & 67.0 & Yes \\
69
+ India & \textit{Aphis craccivora} & Cowpea aphid & insect & 59.0 & Yes \\
70
+ US & \textit{chaetocnema pulicaria} & corn flea beetle & insect & β€” & No \\
71
+ US & \textit{hypera zoilus} & clover leaf weevil & insect & β€” & No \\
72
+ US & \textit{agromyza frontella} & alfalfa blotch leafminer & insect & β€” & No \\
73
+ US & \textit{resseliella maxima} & soybean gall midge & insect & β€” & No \\
74
+ US & \textit{aphis glycines} & soybean aphid & insect & β€” & No \\
75
+ US & \textit{Damsel bugs} & Damsel bugs & insect & β€” & No \\
76
+ US & \textit{Flower fly larvae} & Flower fly larvae & insect & β€” & No \\
77
+ US & \textit{Ground beetles} & Ground beetles & insect & β€” & No \\
78
+ US & \textit{Lacewings} & Lacewings & insect & β€” & No \\
79
+ US & \textit{Lady beetles} & Lady beetles & insect & β€” & No \\
80
+ US & \textit{Parasitoid wasps} & Parasitoid wasps & insect & β€” & No \\
81
+ US & \textit{Pirate bugs} & Pirate bugs & insect & β€” & No \\
82
+ US & \textit{Soldier beetles} & Soldier beetles & insect & β€” & No \\
83
+ US & \textit{Podisus maculiventris} & Spined soldier bug & insect & β€” & No \\
84
+ US & \textit{Tachinid flies} & Tachinid flies & insect & β€” & No \\
85
+ US & \textit{empoasca fabae} & potato leafhopper & insect & 97.0 & No \\
86
+ US & \textit{striacosta albicosta} & western bean cutworm & insect & 97.0 & No \\
87
+ US & \textit{hypena scabra} & green cloverworm & insect & 96.0 & No \\
88
+ US & \textit{agrotis ipsilon} & black cutworm & insect & 95.0 & No \\
89
+ US & \textit{vanessa cardui} & painted lady & insect & 95.0 & No \\
90
+ US & \textit{popillia japonica} & Japanese beetle & insect & 94.0 & No \\
91
+ US & \textit{mythimna unipuncta} & armyworm & insect & 94.0 & No \\
92
+ US & \textit{lygus lineolaris} & tarnished plant bug & insect & 92.0 & No \\
93
+ US & \textit{colias eurytheme} & alfalfa caterpillar & insect & 91.0 & No \\
94
+ US & \textit{microtechnites bractatus} & garden fleahopper & insect & 90.0 & No \\
95
+ US & \textit{papaipema nebris} & stalk borer & insect & 90.0 & No \\
96
+ US & \textit{sitona hispidulus} & clover root curculio & insect & 89.0 & No \\
97
+ US & \textit{philaenus spumarius} & meadow spittlebug & insect & 89.0 & No \\
98
+ US & \textit{dectes texanus} & dectes stem borer & insect & 88.0 & No \\
99
+ US & \textit{ostrinia nubilalis} & European corn borer & insect & 88.0 & No \\
100
+ US & \textit{cerotoma trifurcata} & bean leaf beetle & insect & 87.0 & No \\
101
+ US & \textit{helicoverpa zea} & Tomato fruitworm & insect & 87.0 & No \\
102
+ US & \textit{spodoptera ornithogalli} & yellowstriped armyworm & insect & 86.0 & No \\
103
+ US & \textit{chrysodeixis includens} & soybean looper & insect & 83.0 & No \\
104
+ US & \textit{spodoptera frugiperda} & fall armyworm & insect & 80.0 & No \\
105
+ US & \textit{calomycterus setarius} & imported longhorned weevil & insect & 79.0 & No \\
106
+ US & \textit{loxostege cereralis} & alfalfa webworm & insect & 79.0 & No \\
107
+ US & \textit{odontota horni} & Soybean leaf miner & insect & 75.0 & No \\
108
+ US & \textit{hypera postica} & alfalfa weevil & insect & 73.0 & No \\
109
+ US & \textit{tetranychus urticae} & twospotted spider mite & insect & 73.0 & No \\
110
+ US & \textit{choristoneura parallela} & Spotted fireworm & insect & 44.0 & No \\
111
+ US & \textit{stenolophus lecontei} & Seedcorn beetle & insect & 100.0 & No \\
112
+ US & \textit{delia platura} & Seedcorn maggot & insect & 100.0 & No \\
113
+ US & \textit{hydraecia immanis} & Hop Vine Borer & insect & 100.0 & No \\
114
+ US & \textit{solanum ptycanthum} & Eastern black nightshade & weed & β€” & No \\
115
+ US & \textit{conyza canadensis} & Horseweed & weed & β€” & No \\
116
+ US & \textit{kochia scoparia} & Kochia & weed & β€” & No \\
117
+ US & \textit{sinapis arvensis} & Wild mustard & weed & β€” & No \\
118
+ US & \textit{ambrosia artemisiifolia} & common Ragweed & weed & 95.0 & No \\
119
+ US & \textit{stellaria media} & commonChickweed & weed & 95.0 & No \\
120
+ US & \textit{equisetum arvense} & Field Horsetail & weed & 95.0 & No \\
121
+ US & \textit{digitaria sanguinalis} & Large crabgrass & weed & 95.0 & No \\
122
+ US & \textit{sida spinosa} & Prickly sida & weed & 95.0 & No \\
123
+ US & \textit{cyperus esculentus} & yellow Nutsedge & weed & 95.0 & No \\
124
+ US & \textit{helianthus annuus} & Common Sunflower & weed & 90.0 & No \\
125
+ US & \textit{bromus tectorum} & Downy brome & weed & 90.0 & No \\
126
+ US & \textit{setaria viridis} & Green foxtail & weed & 90.0 & No \\
127
+ US & \textit{euphorbia dentata} & Toothed spurge & weed & 90.0 & No \\
128
+ US & \textit{mirabilis nyctaginea} & wild Four-o’clock & weed & 90.0 & No \\
129
+ US & \textit{setaria faberi} & Giant foxtail & weed & 85.0 & No \\
130
+ US & \textit{eleusine indica} & Goosegrass & weed & 85.0 & No \\
131
+ US & \textit{salsola tragus} & Russian thistle & weed & 85.0 & No \\
132
+ US & \textit{sorghum bicolor} & Shattercane & weed & 85.0 & No \\
133
+ US & \textit{setaria pumila} & Yellow foxtail & weed & 85.0 & No \\
134
+ US & \textit{persicaria pensylvanica} & Pennsylvania smartweed & weed & 80.0 & No \\
135
+ US & \textit{amaranthus palmeri} & Palmer amaranth & weed & 75.0 & No \\
136
+ US & \textit{lolium multiflorum} & Annual ryegrass & weed & 40.0 & No \\
137
+ US & \textit{echinochloa crus-galli} & Barnyardgrass & weed & 100.0 & No \\
138
+ US & \textit{xanthium strumarium} & common Cocklebur & weed & 100.0 & No \\
139
+ US & \textit{chenopodium album} & common Lambsquarters & weed & 100.0 & No \\
140
+ US & \textit{amaranthus tuberculatus} & CommonWaterhemp & weed & 100.0 & No \\
141
+ US & \textit{ambrosia trifida} & Gaint ragweed & weed & 100.0 & No \\
142
+ US & \textit{lamium amplexicaule} & Henbit (deadnettle) & weed & 100.0 & No \\
143
+ US & \textit{datura stramonium} & Jimsonweed & weed & 100.0 & No \\
144
+ US & \textit{lactuca serriola} & Prickly lettuce & weed & 100.0 & No \\
145
+ US & \textit{amaranthus retroflexus} & Redroot pigweed & weed & 100.0 & No \\
146
+ US & \textit{equisetum hyemale} & Scouringrush & weed & 100.0 & No \\
147
+ US & \textit{capsella bursa-pastoris} & Shepherd’s purse & weed & 100.0 & No \\
148
+ US & \textit{abutilon theophrasti} & Velvetleaf & weed & 100.0 & No \\
149
+ US & \textit{daucus carota} & Wild Carrot & weed & 100.0 & No \\
150
+
151
+ \end{longtable}
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β†’ 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/data_level0.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β†’ 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/header.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β†’ 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/length.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b274da292d64f026adecde33133c35635f3faf9e38eee883d259dcf632c7729b
3
  size 40000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11491cf0eac47e805aa1b059bb8d72b895d20b41d24581b6a4383eff57db12f5
3
  size 40000
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{8da9893a-19f6-48c6-bb16-8a169d9e166f β†’ 0bdb47f3-00af-43ed-a2af-ae5a3eee5f98}/link_lists.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:717b0646137d385b2777333886c81f41d57bae3261a881b66c728a21e465c29b
3
- size 5414912
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12653e79b55a19108699f56736a4d97a4ad00f3627d6504348862d911eaa1688
3
+ size 5410816