✨ Awesome Issue Resolution

Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey

📖 Documentation Website | 📄 Full Paper | 📋 Tables & Resources

📖 Abstract

Based on a systematic review of 135 papers and online resources, this survey establishes a holistic theoretical framework for Issue Resolution in software engineering. We examine how Large Language Models (LLMs) are transforming the automation of GitHub issue resolution. Beyond the theoretical analysis, we have curated a comprehensive collection of datasets and model training resources, which are continuously synchronized with our GitHub repository and project documentation website.

🔍 Explore This Survey:

📊 Data: Evaluation and training datasets, data collection and synthesis methods
🛠️ Methods: Training-free (agent/workflow) and training-based (SFT/RL) approaches
🔍 Analysis: Insights into both data characteristics and method performance
📋 Tables & Resources: Comprehensive statistical tables and resources
📄 Full Paper: Read the complete survey paper

📊 Data

Evaluation Datasets

We comprehensively survey evaluation benchmarks for issue resolution, categorizing them by programming language, multimodal support, and reproducible execution environments.

Key Datasets:

SWE-bench: Python-based benchmark with 2,294 real-world issues from 12 repositories
SWE-bench Lite: Curated subset of 300 high-quality instances
Multi-SWE-bench: Multilingual extension covering 7+ programming languages
SWE-bench Multimodal: Incorporates visual elements (JS, TS, HTML, CSS)
Visual SWE-bench: Focus on vision-intensive issue resolution

→ Explore all evaluation datasets

Training Datasets

We analyze trajectory datasets used for agent training, including both human-annotated and synthetically generated examples.

Notable Resources:

R2E-Gym: 3,321 trajectories for reinforcement learning
SWE-Gym: 491 expert trajectories for supervised fine-tuning
SWE-Fixer: Large-scale dataset with 69,752 editing chains of thought

→ Explore training datasets

🛠️ Methods

Training-Free Approaches

Agent-Based Methods

Autonomous agents that leverage tool use, memory, and planning to resolve issues without task-specific training.

Representative Works:

OpenHands: Multi-agent collaboration framework
Agentless: Localization + repair pipeline without agent loops
AutoCodeRover: Hierarchical search-based code navigation

Workflow-Based Methods

Structured pipelines optimizing specific stages of issue resolution.

Key Innovations:

Meta-RAG: Code summarization for enhanced retrieval
TestAider: Test-driven development integration
PatchPilot: Automated patch validation and refinement

→ Explore training-free methods

Training-Based Approaches

Supervised Fine-Tuning (SFT)

Models trained on expert trajectories to internalize issue resolution patterns.

Notable Models:

Devstral (22B): 46.8% on SWE-bench Verified
Co-PatcheR (14B): Multi-stage training with code editing focus
SWE-Swiss (32B): Synthetic data augmentation for improved generalization

Reinforcement Learning (RL)

Models optimized through environmental feedback and reward signals.

State-of-the-Art:

OpenHands Critic (32B): 66.4% on SWE-bench Verified
Kimi-Dev (72B): 60.4% with outcome-based rewards
DeepSWE (32B): Trained from scratch using RL on code repositories

→ Explore training-based methods

🔍 Analysis

Data Analysis

Quality vs. Quantity: Analysis of dataset characteristics and their impact on model performance
Contamination Detection: Protocols for ensuring benchmark integrity
Difficulty Spectrum: Stratification of issues by complexity

Methods Analysis

Performance Trends: Comparative evaluation across model families and sizes
Scaling Laws: Analysis of parameter count vs. performance gains
Efficiency Metrics: Cost-benefit analysis of different approaches

→ Explore detailed analysis

🚀 Challenges and Opportunities

🔧 High computational overhead

The scalability of SWE agents is bottlenecked by the high costs of sandboxed environments and long-context inference. Optimization strategies are required to streamline these resource-intensive loops without sacrificing performance.

📉 Opacity in resource consumption

Benchmarks often overlook efficiency, masking the high costs of techniques like inference-time scaling. Standardized reporting of latency and token usage is crucial for guiding the development of cost-effective agents.

🖼️ Limited visually-grounded reasoning

Reliance on text proxies for UI interpretation limits effectiveness. Future research can adopt intrinsic multi-modal solutions, such as code-centric MLLMs, to better bridge the gap between visual rendering and underlying code logic.

🛡️ Safety risks in autonomous resolution

High autonomy carries risks of destructive actions, such as accidental code deletion. Future systems should integrate safeguards, such as Git-based version control, to ensure autonomous modifications remain secure and reversible.

🎯 Lack of fine-grained reward signals

Reinforcement learning is hindered by sparse, binary feedback. Integrating fine-grained signals from compiler diagnostics and execution traces is necessary to guide models through complex reasoning steps.

🔍 Data leakage and contamination

As benchmarks approach saturation, evaluation validity is compromised by data leakage. Future frameworks must strictly enforce decontamination protocols to ensure fairness and reliability.

🌐 Lack of universality across SE domains

While current issue resolution tasks mirror development workflows, they represent only a fraction of the full Software Development Life Cycle (SDLC). Future research should broaden the scope of issue resolution tasks to develop more versatile automated software generation methods.

📋 Tables & Resources

Visit our Tables & Resources page for comprehensive statistical tables including:

📊 Evaluation Datasets Overview: Detailed comparison of 30+ benchmarks
🎯 Training Trajectory Datasets: Analysis of 5 major trajectory datasets
🔧 Supervised Fine-Tuning Models: Performance metrics for 10+ SFT models
🤖 Reinforcement Learning Models: Comprehensive analysis of 30+ RL-trained models
🌟 General Foundation Models: Evaluation of 15+ general-purpose LLMs

🤝 Contributing

We welcome contributions to this survey! If you'd like to add new papers or fix errors:

Fork this repository
Add paper entries in the corresponding YAML file under data/ directory (e.g., papers_evaluation_datasets.yaml, papers_single_agent.yaml, etc.)
Follow the existing format with fields: short_name, title, authors, venue, year, and links (arxiv, github, huggingface)
Run python scripts/render_papers.py to update the documentation
Submit a PR with your changes

📄 Citation

If you use this project or related survey in your research or system, please cite the following BibTeX:

@misc{li2025awesome_issue_resolution,
    title       = {Advances and Frontiers of LLM-based Issue Resolution in Software Engineering A Comprehensive Survey},
    author      = {Caihua Li and Lianghong Guo and Yanlin Wang and Wei Tao and Zhenyu Shan and Mingwei Liu and Jiachi Chen and Haoyu Song and Duyu Tang and Hongyu Zhang and Zibin Zheng},
    year        = {2025},
    howpublished = {\url{https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution}}
}

Once published on arXiv or at a conference, please replace the entry with the official citation information (authors, DOI/arXiv ID, conference name, etc.).

📬 Contact

If you have any questions or suggestions, please contact us through:

📧 Email: noranotdor4@gmail.com
💬 GitHub Issues: Open an issue

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

⭐ Star this repository if you find it helpful!

Made with ❤️ by the DeepSoftwareAnalytics team

Documentation | Paper | Tables | About | Cite

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
data		data
docs		docs
figures		figures
scripts		scripts
site		site
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ Awesome Issue Resolution

📖 Abstract

📊 Data

Evaluation Datasets

Training Datasets

🛠️ Methods

Training-Free Approaches

Agent-Based Methods

Workflow-Based Methods

Training-Based Approaches

Supervised Fine-Tuning (SFT)

Reinforcement Learning (RL)

🔍 Analysis

Data Analysis

Methods Analysis

🚀 Challenges and Opportunities

🔧 High computational overhead

📉 Opacity in resource consumption

🖼️ Limited visually-grounded reasoning

🛡️ Safety risks in autonomous resolution

🎯 Lack of fine-grained reward signals

🔍 Data leakage and contamination

🌐 Lack of universality across SE domains

📋 Tables & Resources

🤝 Contributing

📄 Citation

📬 Contact

📜 License

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/Awesome-Issue-Resolution

Folders and files

Latest commit

History

Repository files navigation

✨ Awesome Issue Resolution

📖 Abstract

📊 Data

Evaluation Datasets

Training Datasets

🛠️ Methods

Training-Free Approaches

Agent-Based Methods

Workflow-Based Methods

Training-Based Approaches

Supervised Fine-Tuning (SFT)

Reinforcement Learning (RL)

🔍 Analysis

Data Analysis

Methods Analysis

🚀 Challenges and Opportunities

🔧 High computational overhead

📉 Opacity in resource consumption

🖼️ Limited visually-grounded reasoning

🛡️ Safety risks in autonomous resolution

🎯 Lack of fine-grained reward signals

🔍 Data leakage and contamination

🌐 Lack of universality across SE domains

📋 Tables & Resources

🤝 Contributing

📄 Citation

📬 Contact

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages