Multilingual Disaster-Relevance TwHIN-BERT

This is a multilingual Twhin-BERT-base model fine-tuned for relevance classification of social media posts in natural disaster response. The model was developed using a dataset of 4,574 manually labelled posts from five use case scenarios: the 2020 California wildfires, 2021 Ahr Valley floods, 2023 Chile wildfires, 2023 Turkey earthquake and 2023 Emilia-Romagna floods. 3,659 posts were used for model training and validation, and 915 were used for testing.

🏷️ Labels

The model classifies short texts using one of the following labels:

  • Related and relevant: A post that is related to the respective natural disaster and relevant for emergency responders. It contains useful information for supporting disaster management (e.g., posts about destructions, in-situ information, critical infrastructure, affected individuals, affected areas, requests for help, caution or advice).
  • Related but not relevant: A post that refers to the respective disaster but does not contain helpful or valuable information for supporting disaster management (e.g., declarations of solidarity, volunteering initiatives, appeals for donations, political or religious statements, bot-generated content, comparisons to past events, shared news articles).
  • Not related: A post that has no relation to the disaster event in question.

βš™οΈ Example pipeline

from transformers import pipeline
MODEL_NAME = 'hannybal/multilingual-disaster-relevance-twhin-bert'
classifier = pipeline('text-classification', model=MODEL_NAME)
classifier('I can see fire and smoke from the nearby fire!')

Output:

[{'label': 'Related and relevant', 'score': 0.997657299041748}]

πŸ’‘ Performance

The model achieved strong performance across our test data:

Metric Value
Macro F1 0.779
Accuracy 0.802
ROC-AUC 0.928

πŸ“ƒ Reference

If you want to use this model in your research, please cite it as follows:

@article{Hanny.2025c,
  title = {A Multimodal {{GeoAI}} Approach to Combining Text with Spatiotemporal Features for Enhanced Relevance Classification of Social Media Posts in Disaster Response},
  author = {Hanny, David and Schmidt, Sebastian and Gandhi, Shaily and Granitzer, Michael and Resch, Bernd},
  year = {2025},
  journal = {Big Earth Data},
  volume = {0},
  number = {0},
  pages = {1--45},
  publisher = {Taylor \& Francis},
  issn = {2096-4471},
  doi = {10.1080/20964471.2025.2572140},
  urldate = {2025-10-24},
  abstract = {Geo-referenced social media data supports disaster management by offering real-time insights through user-generated content. To identify critical information amid high volumes of noise, classifying the relevance of posts is essential. Most existing methods primarily use textual features, neglecting spatial and temporal context despite its importance in determining relevance. This study proposes a multimodal approach that integrates text with spatiotemporal features for relevance classification of geo-referenced social media posts. We evaluate our method on 4,574 manually labelled posts from five disasters: the 2020 California wildfires, 2021 Ahr Valley floods, 2023 Chile wildfires, 2023 Turkey earthquake and 2023 Emilia-Romagna floods. Labels were assigned based on text, geographic location and time. Our spatiotemporal features include proximity to disaster impact sites, local co-occurrences with disaster-related posts, event type and geographic context. When utilised on their own, they achieved a macro F1 score of 0.713 with a random forest classifier. A fine-tuned TwHIN-BERT-base model using only text scored 0.779. For multimodal classification, we tested feature concatenation, in-context learning, stacking and partial stacking. Partial stacking produced the highest macro F1 score (0.814). Our multilingual, context-aware classification approach lays the groundwork for more integrated GeoAI applications in disaster management, the social sciences and beyond.},
  keywords = {disaster management,GeoAI,Machine learning,multimodal learning,Published,relevance classification,social media}
}

Acknowledgements

This work has received funding from the European Commission - European Union under HORIZON EUROPE (HORIZON Research and Innovation Actions) under grant agreement 101093003 (HORIZON-CL4-2022-DATA-01-01).

Downloads last month
11
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for hannybal/multilingual-disaster-relevance-twhin-bert

Finetuned
(4)
this model