syncIALO 🤖🗯️

What is this?

Synthetic drop-in replacements for Kialo debate datasets.

Why?

The Kialo debates are a 👑 gold mine for NLP researchers, AI engineers, computational sociologists, and Critical Thinking scholars. Yet, the mine is legally ⛔️ barred (for them): Debate data downloaded or scraped from the website may not be used for research or commercial purposes in the absence of explicit permission or license agreement.

That's why the DebateLab team has built this python module for creating synthetic debate corpora, which may serve as a drop-in replacements for the Kialo data. We're synthesizing such data from scratch, simulating multi-agent debate and collaborative argument-mapping with 🤖 LLM-based agents.

Features

permissive ODC license
reproducible and extendable
open source code basis
works with open LLMs
one-line-import as networkx graphs

Corpora

id	llm	# debates	~# claims	link	contributed by
synthetic_corpus-001	Llama-3.1-405B-Instruct	1000/50/50¹	560k/28k/28k¹	HF hub→	DebateLab²
synthetic_corpus-001-DE	Llama-3.1-SauerkrautLM-70b-Instruct³	1000/50/50¹	560k/28k/28k¹	HF hub→	DebateLab

¹ per train / eval / test split
² with ❤️ generous support from 🤗 HuggingFace
³ as translator

Simulation Design

The following steps sketch the procedure by which debates are simulated:

Determine the debate's tag cloud by randomly sampling 8 topic tags.
Given the tag cloud, let 🤖 generate a debate topic (e.g., a question).
Given the topic, let 🤖 generate a suitable motion (i.e., the central claim).
Recursively generate an argument tree, starting with the motion as target argument (code→):
1. Let 🤖 identify the implicit premises of the target argument (code→).
2. Let 🤖 generate k pros for different premises of the target argument (code→):
  - Choose premise to target in function of premises' plausibility.
  - Let 🤖 assume randomly sampled persona.
  - Generate 2k candidate arguments and select k most salient ones.
3. Let 🤖 generate k cons against different premises of the target argument (code→):
  - Choose premise to target in function of premises' implausibility.
  - Let 🤖 assume randomly sampled persona.
  - Generate 2k candidate arguments and select k most salient ones.
4. Check for and resolve duplicates via semantic similarity / vector store (code→).
5. Add pros and cons to argument tree, and use each of these as new target argument that is argued for and against, unless max depth has been reached.

Usage

Configure workflows/synthetic_corpus_generation.py. Then:

hatch shell
python workflows/synthetic_corpus_generation.py

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
assets		assets
data		data
notebooks		notebooks
src/syncialo		src/syncialo
tests		tests
workflows		workflows
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

syncIALO 🤖🗯️

What is this?

Why?

Features

Corpora

Simulation Design

Usage

About

Uh oh!

Releases

Packages

Languages

License

debatelab/syncIALO

Folders and files

Latest commit

History

Repository files navigation

syncIALO 🤖🗯️

What is this?

Why?

Features

Corpora

Simulation Design

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages