publications

2024

  1. mreward.png
    M-RewardBench: Evaluating Reward Models in Multilingual Settings
    Srishti Gureja, Lester James V. Miranda, Shayekh Bin Islam, Rishabh Maheshwary, Drishti Sharma, Gusti Winata, Nathan Lambert, Sebastian RuderSara Hooker, and Marzieh Fadaee
    2024
  2. mixmerge.png
    Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
    Aakanksha, Arash Ahmadian, Seraphina Goldfarb-Tarrant, Beyza Ermis, Marzieh Fadaee, and Sara Hooker
    2024
  3. iter.png
    Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
    Simon Yu, Liangyu Chen, Sara Ahmadian, and Marzieh Fadaee
    2024
  4. code.png
    To Code, or Not To Code? Exploring Impact of Code in Pre-training
    Viraat Aryabumi, Yixuan Su, Raymond Ma, Adrien Morisot, Ivan Zhang, Acyr Locatelli, Marzieh FadaeeAhmet Üstün, and Sara Hooker
    2024
  5. inherit.png
    LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
    Luísa Shimabucoro, Sebastian RuderJulia KreutzerMarzieh Fadaee, and Sara Hooker
    2024
  6. prism.png
    The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
    Aakanksha, Arash Ahmadian, Beyza Ermis, Seraphina Goldfarb-Tarrant, Julia KreutzerMarzieh Fadaee, and Sara Hooker
    2024
  7. aya23.png
    Aya 23: Open Weight Releases to Further Multilingual Progress
    Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil BlunsomMarzieh FadaeeAhmet Üstün, and Sara Hooker
    2024
  8. rloo.png
    Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
    Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh FadaeeJulia KreutzerAhmet Üstün, and Sara Hooker
    2024
  9. ayamodel.png
    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
    Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D’souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh FadaeeJulia Kreutzer, and Sara Hooker
    2024
  10. ayadata.png
    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
    Shivalika Singh, Freddie Vargus, Daniel Dsouza, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, Marina Machado, Luisa Souza Moura, Dominik Krzemiński, Hakimeh Fadaei, Irem Ergün, Ifeoma Okoh, Aisha Alaagib, Oshan Mudannayake, Zaid Alyafeai, Vu Minh Chien, Sebastian Ruder, Surya Guthikonda, Emad A. Alghamdi, Sebastian Gehrmann, Niklas Muennighoff, Max Bartolo, Julia KreutzerAhmet ÜstünMarzieh Fadaee, and Sara Hooker
    2024

2023

  1. elo.png
    Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
    Meriem BoubdirEdward Kim, Beyza Ermis, Sara Hooker, and Marzieh Fadaee
    2023
  2. prompts.png
    Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
    Meriem BoubdirEdward Kim, Beyza Ermis, Marzieh Fadaee, and Sara Hooker
    2023
  3. less.png
    When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
    2023
  4. v2.png
    InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval
    Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, and Rodrigo Nogueira
    2023

2022

  1. In Defense of Cross-Encoders for Zero-Shot Retrieval
    Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, and Rodrigo Nogueira
    2022
  2. inpars.png
    InPars: Data Augmentation for Information Retrieval using Large Language Models
    Luiz Henrique Bonifacio, Hugo Abonizio, Marzieh Fadaee, and Rodrigo Nogueira
    In SIGIR, Feb 2022
  3. noparam.png
    No Parameter Left Behind: How Distillation and Model Size Affect Zero-Shot Retrieval
    Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, and Rodrigo Nogueira
    In arXiv, Feb 2022

2021

  1. mmarco.png
    mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset
    Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, and Rodrigo Nogueira
    In arXiv, Feb 2021

2020

  1. final_cover.png
    Understanding and Enhancing the Use of Context for Machine Translation
    Marzieh Fadaee
    Oct 2020
  2. za.png
    A New Neural Search and Insights Platform for Navigating and Organizing AI Research
    Marzieh Fadaee, Olga Gureenkova, Fernando Rejon Barrera, Carsten Schnober, Wouter Weerkamp, and Jakub Zavrel
    In Proceedings of the First Workshop on Scholarly Document Processing, Nov 2020
  3. vol.png
    The Unreasonable Volatility of Neural Machine Translation Models
    Marzieh Fadaee, and Christof Monz
    In Proceedings of the Fourth Workshop on Neural Generation and Translation, Jul 2020

2018

  1. bt.png
    Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation
    Marzieh Fadaee, and Christof Monz
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Jul 2018
  2. idiom.png
    Examining the Tip of the Iceberg: A Data Set for Idiom Translation
    Marzieh FadaeeArianna Bisazza, and Christof Monz
    In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018

2017

  1. tda.png
    Data Augmentation for Low-Resource Neural Machine Translation
    Marzieh FadaeeArianna Bisazza, and Christof Monz
    In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2017
  2. emb.png
    Learning Topic-Sensitive Word Representations
    Marzieh FadaeeArianna Bisazza, and Christof Monz
    In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Jul 2017

2013

  1. Automatic WordNet Construction Using Markov Chain Monte Carlo
    Marzieh FadaeeHamidreza GhaderHeshaam Faili, and Azadeh Shakery
    Polibits, Jul 2013