publications | jgcarrasco

2025

Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference

Jorge García-Carrasco, Alejandro Maté, and Juan Trujillo

AAAI Conference on Artificial Intelligence (In Press), 2025

How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability

Jorge García-Carrasco, Alejandro Maté, and Juan Carlos Trujillo

In International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Detecting and understanding vulnerabilities in language models via mechanistic interpretability

Jorge García-Carrasco, Alejandro Maté, and Juan Trujillo

In International Joint Conference on Artificial Intelligence (IJCAI), 2024