publications

2025

  1. Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
    Jorge García-Carrasco, Alejandro Maté, and Juan Trujillo
    AAAI Conference on Artificial Intelligence (In Press), 2025

2024

  1. How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
    Jorge García-Carrasco, Alejandro Maté, and Juan Carlos Trujillo
    In International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
  2. Detecting and understanding vulnerabilities in language models via mechanistic interpretability
    Jorge García-Carrasco, Alejandro Maté, and Juan Trujillo
    In International Joint Conference on Artificial Intelligence (IJCAI), 2024