News Headlines Dataset For Sarcasm Detection

This project implements a binary text classification model to detect sarcasm in news headlines using transformer-based models including BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Pretraining Approach).

Dataset

The dataset is sourced from Kaggle and contains news headlines labeled as sarcastic (1) or not sarcastic (0).

Link to the dataset: News Headlines Dataset For Sarcasm Detection

Model Architectures

BERT Model

Fine-tuned BERT model (bert-base-uncased)
Binary classification head
Trained for 3 epochs with AdamW optimizer (learning rate: 2e-5)
Batch size of 32

RoBERTa Model

Fine-tuned RoBERTa model (roberta-base)
Binary classification head
Trained for 3 epochs with AdamW optimizer (learning rate: 2e-5)
Batch size of 32

ALBERT Model

Fine-tuned ALBERT model (albert-base-v2)
Binary classification head
Trained for 3 epochs with AdamW optimizer (learning rate: 2e-5)
Batch size of 32

DistilBERT Model

Fine-tuned DistilBERT model (distilbert-base-uncased)
Binary classification head
Trained for 3 epochs with AdamW optimizer (learning rate: 2e-5)
Batch size of 32

Sentence-BERT (SBERT) Model

Fine-tuned Sentence-BERT model (sentence-transformers/all-MiniLM-L6-v2)
Custom classification head on top of sentence embeddings
Trained for 3 epochs with AdamW optimizer (learning rate: 2e-5)
Batch size of 32

Results

BERT Results

After training for 3 epochs, the BERT model achieves:

Test Accuracy: 92.42%
Test Precision: 90.17%
Test Recall: 94.05%
Test F1-Score: 92.07%

RoBERTa Results

After training for 3 epochs, the RoBERTa model achieves:

Test Accuracy: 93.68%
Test Precision: 93.23%
Test Recall: 93.27%
Test F1-Score: 93.25%

ALBERT Results

After training for 3 epochs, the ALBERT model achieves:

Test Accuracy: 91.86%
Test Precision: 93.70%
Test Recall: 88.55%
Test F1-Score: 91.05%

DistilBERT Results

After training for 3 epochs, the DistilBERT model achieves:

Test Accuracy: 92.35%
Test Precision: 90.67%
Test Recall: 93.25%
Test F1-Score: 91.94%

Sentence-BERT Results

After training for 3 epochs, the Sentence-BERT model achieves:

Test Accuracy: 91.45%
Test Precision: 93.93%
Test Recall: 87.38%
Test F1-Score: 90.53%

Model Comparison

A comparative analysis of all transformer models for sarcasm detection reveals interesting insights:

Performance Metrics Comparison

Model	Accuracy	Precision	Recall	F1-Score	Training Time
RoBERTa	93.68%	93.23%	93.27%	93.25%	456s
BERT	92.42%	90.17%	94.05%	92.07%	445s
DistilBERT	92.35%	90.67%	93.25%	91.94%	295s
ALBERT	91.86%	93.70%	88.55%	91.05%	613s
SBERT	91.45%	93.93%	87.38%	90.53%	144s

Key Findings:

Best Overall Performance: RoBERTa outperforms all other models across most metrics, achieving the highest accuracy (93.68%) and F1-score (93.25%). Its balanced precision and recall indicate robust performance across both sarcastic and non-sarcastic classes.
Precision vs. Recall Trade-offs:
- BERT and DistilBERT favor recall over precision, meaning they're better at identifying sarcastic headlines but may produce more false positives.
- ALBERT and SBERT favor precision over recall, meaning they're more conservative in labeling headlines as sarcastic but may miss some sarcastic examples.
Efficiency Considerations:
- Sentence-BERT is remarkably efficient, completing training in just 144 seconds (3x faster than DistilBERT and 4x faster than RoBERTa).
- DistilBERT offers an excellent balance of performance and efficiency, achieving 92.35% accuracy while training 1.5x faster than BERT.
- ALBERT, despite its parameter-sharing design, was surprisingly the slowest model to train in this task.
Practical Implications:
- For production environments with limited resources, Sentence-BERT or DistilBERT provide the best performance-to-efficiency ratio.
- For applications where accuracy is paramount, RoBERTa is the clear choice.
- For applications where minimizing false positives is critical, ALBERT or SBERT would be preferred due to their higher precision.

The comparison demonstrates that while larger models like RoBERTa generally perform better, smaller and more efficient models like DistilBERT and SBERT can achieve competitive results with significantly reduced computational requirements.

Performance vs. Efficiency Visualization

Accuracy (higher is better)
RoBERTa    ████████████████████████████████████████████████ 93.68%
BERT       ███████████████████████████████████████████▌    92.42%
DistilBERT ███████████████████████████████████████████▎    92.35%
ALBERT     ██████████████████████████████████████████▋     91.86%
SBERT      ██████████████████████████████████████████      91.45%

Training Time (lower is better)
SBERT      ██████▌                                         144s
DistilBERT █████████████▌                                  295s
BERT       ████████████████████▎                           445s
RoBERTa    ████████████████████▋                           456s
ALBERT     ███████████████████████████▊                    613s

This visualization clearly shows the trade-off between model accuracy and training efficiency. While RoBERTa achieves the highest accuracy, SBERT offers dramatically faster training times with only a modest reduction in accuracy.

Model Size Comparison

Model	Parameters	Size (MB)	Relative Size
RoBERTa	125M	~500 MB	100%
BERT	110M	~440 MB	88%
ALBERT	12M	~50 MB	10%
DistilBERT	66M	~260 MB	52%
SBERT	22M	~90 MB	18%

The parameter count and model size significantly impact deployment considerations. ALBERT and SBERT achieve impressive performance despite their much smaller footprints, making them excellent candidates for resource-constrained environments or mobile applications.

Comprehensive BERT Model Family Comparison

Model	Accuracy	Precision	Recall	F1-Score	Parameters	Size (MB)	Training Time	Training Parameters	Model Configuration	Performance Characteristics
RoBERTa	93.68%	93.23%	93.27%	93.25%	125M	~500 MB	456s	• Optimizer: AdamW • Learning rate: 2e-5 • Epochs: 3 • Batch size: 32 • Loss function: CrossEntropyLoss	• Base model: `roberta-base` • Tokenizer: RoBERTa tokenizer • Max sequence length: 128 • Binary classification head	Highest overall performance, balanced precision and recall
BERT	92.42%	90.17%	94.05%	92.07%	110M	~440 MB	445s	• Optimizer: AdamW • Learning rate: 2e-5 • Epochs: 3 • Batch size: 32 • Loss function: CrossEntropyLoss	• Base model: `bert-base-uncased` • Tokenizer: BERT tokenizer • Max sequence length: 128 • Binary classification head	Strong recall (94.05%), good for identifying sarcastic content
DistilBERT	92.35%	90.67%	93.25%	91.94%	66M	~260 MB	295s	• Optimizer: AdamW • Learning rate: 2e-5 • Epochs: 3 • Batch size: 32 • Loss function: CrossEntropyLoss	• Base model: `distilbert-base-uncased` • Tokenizer: DistilBERT tokenizer • Max sequence length: 128 • Binary classification head	Excellent efficiency-to-performance ratio, 52% of BERT's size
ALBERT	91.86%	93.70%	88.55%	91.05%	12M	~50 MB	613s	• Optimizer: AdamW • Learning rate: 2e-5 • Epochs: 3 • Batch size: 32 • Loss function: CrossEntropyLoss	• Base model: `albert-base-v2` • Tokenizer: ALBERT tokenizer • Max sequence length: 128 • Binary classification head	Highest precision (93.70%), smallest model size (10% of RoBERTa)
SBERT	91.45%	93.93%	87.38%	90.53%	22M	~90 MB	144s	• Optimizer: AdamW • Learning rate: 2e-5 • Epochs: 3 • Batch size: 32 • Loss function: CrossEntropyLoss	• Base model: `sentence-transformers/all-MiniLM-L6-v2` • Tokenizer: SBERT tokenizer • Max sequence length: 128 • Custom pooling layer	Fastest training (144s), highest precision (93.93%), great for deployment

Project Structure

notebooks/: Contains Jupyter notebooks for exploration and development
- 01_eda.ipynb: Exploratory data analysis of the dataset (common for all models)
- 02_embeddings.ipynb: Text embedding exploration with BERT
- 03_training.ipynb: BERT model training and evaluation
training_scripts/: Contains production-ready training code
- bert_classification.py: End-to-end script for BERT model training
- roberta_classification.py: End-to-end script for RoBERTa model training
- albert_classification.py: End-to-end script for ALBERT model training
- distilbert_classification.py: End-to-end script for DistilBERT model training
- sbert_classification.py: End-to-end script for Sentence-BERT model training
data/: Contains the dataset files and training statistics
slurm_script.sh: Script for running the training job on a SLURM cluster
results/: Contains SLURM job outputs
- bert_training_job_output.txt: Training logs for BERT model
- bert_training_job_error.txt: Error logs for BERT model
- roberta_training_job_output.txt: Training logs for RoBERTa model
- roberta_training_job_error.txt: Error logs for RoBERTa model
- albert_training_job_output.txt: Training logs for ALBERT model
- albert_training_job_error.txt: Error logs for ALBERT model
- distilbert_training_job_output.txt: Training logs for DistilBERT model
- distilbert_training_job_error.txt: Error logs for DistilBERT model
- sbert_training_job_output.txt: Training logs for Sentence-BERT model
- sbert_training_job_error.txt: Error logs for Sentence-BERT model

Setup and Installation

Local Development

Clone this repository
Create a virtual environment

Install required packages:

pip install torch pandas transformers scikit-learn matplotlib seaborn bertviz

Run the notebooks in the notebooks/ directory

Training on SLURM Cluster

Ensure data is properly placed in the data/ directory

Submit the job using:

# For BERT model (default)
sbatch slurm_script.sh

# For RoBERTa model
sbatch slurm_script.sh roberta

# For ALBERT model
sbatch slurm_script.sh albert

# For DistilBERT model
sbatch slurm_script.sh distilbert

# For Sentence-BERT model
sbatch slurm_script.sh sbert

Monitor output in the results/ directory

Usage

Once trained, the models can be loaded and used for inference:

Using BERT Model

from transformers import BertForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = BertForSequenceClassification.from_pretrained("path/to/saved/bert/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/bert/model")

# Prepare input
headline = "Scientists discover new planet that looks exactly like Earth"
inputs = tokenizer(headline, return_tensors="pt", padding=True, truncation=True)

# Get prediction
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("Sarcastic" if prediction == 1 else "Not sarcastic")

Using RoBERTa Model

from transformers import RobertaForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = RobertaForSequenceClassification.from_pretrained("path/to/saved/roberta/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/roberta/model")

# Prepare input
headline = "Scientists discover new planet that looks exactly like Earth"
inputs = tokenizer(headline, return_tensors="pt", padding=True, truncation=True)

# Get prediction
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("Sarcastic" if prediction == 1 else "Not sarcastic")

Using ALBERT Model

from transformers import AlbertForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = AlbertForSequenceClassification.from_pretrained("path/to/saved/albert/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/albert/model")

# Prepare input
headline = "Scientists discover new planet that looks exactly like Earth"
inputs = tokenizer(headline, return_tensors="pt", padding=True, truncation=True)

# Get prediction
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("Sarcastic" if prediction == 1 else "Not sarcastic")

Using DistilBERT Model

from transformers import DistilBertForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = DistilBertForSequenceClassification.from_pretrained("path/to/saved/distilbert/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/distilbert/model")

# Prepare input
headline = "Scientists discover new planet that looks exactly like Earth"
inputs = tokenizer(headline, return_tensors="pt", padding=True, truncation=True)

# Get prediction
outputs = model(**inputs)
prediction = outputs.logits.argmax().item()
print("Sarcastic" if prediction == 1 else "Not sarcastic")

Using Sentence-BERT Model

import torch
from transformers import AutoTokenizer, AutoModel
from torch import nn

# Define the same model architecture used during training
class SBERTClassifier(nn.Module):
    def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2", num_labels=2):
        super(SBERTClassifier, self).__init__()
        self.sbert = AutoModel.from_pretrained(model_name)
        self.dropout = nn.Dropout(0.1)
        self.classifier = nn.Linear(384, num_labels)
        
    def forward(self, input_ids, attention_mask):
        outputs = self.sbert(input_ids=input_ids, attention_mask=attention_mask)
        token_embeddings = outputs.last_hidden_state
        input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
        sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
        sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
        sentence_embeddings = sum_embeddings / sum_mask
        pooled_output = self.dropout(sentence_embeddings)
        logits = self.classifier(pooled_output)
        return logits

# Load model and tokenizer
model = SBERTClassifier()
model.load_state_dict(torch.load("path/to/saved/sbert/model/model_state_dict.pt"))
model.eval()
tokenizer = AutoTokenizer.from_pretrained("path/to/saved/sbert/model")

# Prepare input
headline = "Scientists discover new planet that looks exactly like Earth"
inputs = tokenizer(headline, return_tensors="pt", padding=True, truncation=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
prediction = outputs.argmax().item()
print("Sarcastic" if prediction == 1 else "Not sarcastic")

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
KhangLe		KhangLe
data		data
inference_scripts		inference_scripts
notebooks		notebooks
profiler_results		profiler_results
results		results
training_scripts		training_scripts
zixun/eda		zixun/eda
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
README_LIGHTGBM.md		README_LIGHTGBM.md
bert_model_architectures_landscape.png		bert_model_architectures_landscape.png
bert_models_radar_chart.png		bert_models_radar_chart.png
confusion_matrix.png		confusion_matrix.png
create_bert_architectures.py		create_bert_architectures.py
create_radar_chart.py		create_radar_chart.py
feature_importance.png		feature_importance.png
inference_slurm.sh		inference_slurm.sh
requirements.txt		requirements.txt
slurm-578099.out		slurm-578099.out
slurm_script.sh		slurm_script.sh

anchengyang/Sarcasm-Detection

Folders and files

Latest commit

History

Repository files navigation

News Headlines Dataset For Sarcasm Detection

Dataset

Model Architectures

BERT Model

RoBERTa Model

ALBERT Model

DistilBERT Model

Sentence-BERT (SBERT) Model

Results

BERT Results

RoBERTa Results

ALBERT Results

DistilBERT Results

Sentence-BERT Results

Model Comparison

Performance Metrics Comparison

Key Findings:

Performance vs. Efficiency Visualization

Model Size Comparison

Comprehensive BERT Model Family Comparison

Project Structure

Setup and Installation

Local Development

Training on SLURM Cluster

Usage

Using BERT Model

Using RoBERTa Model

Using ALBERT Model

Using DistilBERT Model

Using Sentence-BERT Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages