Update model: Test Acc 73.93%, F1 0.7395, max_length=512

4bfd654 verified 14 days ago

3.47 kB

	---
	language: ko
	license: mit
	tags:
	- pytorch
	- bert
	- kobert
	- text-classification
	- stance-detection
	- korean
	- news
	- political
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	model-index:
	- name: stance-classifier-v2
	results:
	- task:
	type: text-classification
	name: Stance Classification
	metrics:
	- type: accuracy
	value: 73.93
	name: Test Accuracy
	- type: f1
	value: 0.7395
	name: Test F1
	---

	# Korean Political News Stance Classifier v2

	KoBERT 기반 한국어 정치 뉴스 스탠스(입장) 분류 모델입니다.

	## Model Description

	- Base Model: monologg/kobert
	- Task: 3-class stance classification (옹호/중립/비판)
	- Language: Korean
	- Training Data: ~12,000 labeled political news articles

	## Performance

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Test Accuracy \| 73.93% \|
	\| Test F1 (macro) \| 0.7395 \|

	## Labels

	\| Label ID \| Korean \| English \| Description \|
	\|----------\|--------\|---------\|-------------\|
	\| 0 \| 옹호 \| support \| 정부/여당에 우호적 \|
	\| 1 \| 중립 \| neutral \| 객관적 사실 전달 \|
	\| 2 \| 비판 \| oppose \| 정부/여당에 비판적 \|

	## Usage

	```python
	import torch
	from transformers import BertModel, AutoTokenizer
	from huggingface_hub import hf_hub_download
	import torch.nn as nn

	# 모델 정의
	class StanceClassifier(nn.Module):
	def __init__(self, bert_model, num_classes=3, dropout_rate=0.3):
	super().__init__()
	self.bert = bert_model
	self.dropout = nn.Dropout(dropout_rate)
	self.classifier = nn.Linear(768, num_classes)

	def forward(self, input_ids, attention_mask, token_type_ids=None):
	outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
	pooled_output = outputs.pooler_output
	pooled_output = self.dropout(pooled_output)
	return self.classifier(pooled_output)

	# 모델 로드
	model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt")
	checkpoint = torch.load(model_path, map_location='cpu')

	bert_model = BertModel.from_pretrained('monologg/kobert')
	model = StanceClassifier(bert_model)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# 토크나이저 로드
	tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)

	# 예측
	text = "정부의 새 정책이 경제 성장에 크게 기여할 것으로 기대된다"
	encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt')

	with torch.no_grad():
	logits = model(encoding['input_ids'], encoding['attention_mask'])
	probs = torch.softmax(logits, dim=1)
	pred = torch.argmax(probs, dim=1).item()

	labels = ['옹호', '중립', '비판']
	print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)")
	```

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| monologg/kobert \|
	\| Max Length \| 512 \|
	\| Batch Size \| 64 \|
	\| Learning Rate \| 2e-05 \|
	\| Dropout \| 0.3 \|
	\| Loss Function \| Focal Loss (gamma=2.0) \|
	\| Early Stopping \| patience=3 \|

	## Citation

	```bibtex
	@misc{korean-stance-classifier-v2,
	title={Korean Political News Stance Classifier v2},
	year={2024},
	publisher={HuggingFace}
	}
	```