Marzieh Fadaee

Senior Research Scientist @ Cohere For AI

prof_pic_moi.jpg

I’m a research scientist at Cohere for AI, the non-profit research lab of Cohere, working with Sara Hooker on complex problems and fundamental research in language understanding. As a scientist, I’m broadly interested in all aspects of natural language understanding, and particularly in multilingual learning, data-conscious learning, robust and scalable models, compositionality, and evaluation.

Previously I was the NLP/ML research lead at Zeta Alpha Vector working on smarter ways to discover and organize knowledge. I did my PhD at the Language Technology Lab (originally part of the ILPS group), University of Amsterdam, working on developing models to understand and utilize interesting phenomena in the data. During my PhD I was advised by Christof Monz and Arianna Bisazza. I received my B.Sc. from Sharif University majoring in Computer Engineering and M.Sc. from University of Tehran majoring in Artificial Intelligence.

news

Oct 15, 2024 Our paper Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning is available on Arxiv now.
Sep 25, 2024 Our Elo Uncovered paper is accepted to Neurips! :medal_military:
Sep 23, 2024 Two papers accepted to EMNLP main track: Multilingual Prism and LLM see, LLM do! :dizzy:
Sep 20, 2024 Our paper Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement is available on Arxiv now.
Aug 24, 2024 Our paper To Code, or Not To Code? Exploring Impact of Code in Pre-training is available on Arxiv now.
Aug 15, 2024 :fire: Our Aya model paper received the ACL Best Paper Award! :fire:
Jul 20, 2024 Our paper LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives is available on Arxiv now.
Jul 19, 2024 Our paper The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm is available on Arxiv now.
Jul 7, 2024 Three papers accepted to ACL main track: Aya Model, Aya Dataset, and RLOO! :collision:
May 31, 2024 Our paper Aya 23: Open Weight Releases to Further Multilingual Progress is available on Arxiv now.
Feb 22, 2024 Our paper Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs is available on Arxiv now.
Feb 13, 2024 We release Aya 101 :herb: a massively multilingual dataset and language model. For more details checkout our Data and Model paper.
Nov 5, 2023 Our paper Elo Uncovered: Robustness and Best Practices in Language Model Evaluation is accepted to GEM Workshop at EMNLP 2023!
Nov 1, 2023 Thrilled to be nominated for Top 5 AI Researcher in the company of some outstanding women :star2:
Oct 22, 2023 Our paper Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation is available on arxiv now!
Oct 20, 2023 During the month of September and October, I had the opportunity to give several talks. You can check them out here :tv:
Sep 11, 2023 Our paper When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale is accepted to Attributing Model Behavior at Scale workshop at Neurips!
Jan 20, 2023 I’m excited to announce that I have joined Sara Hooker’s team at Cohere For AI as a Senior Research Scientist :purple_heart:
Jan 12, 2023 InPars-v2 is SoTA on the BEIR Leaderboard in zero-shot Information Retrieval :stars:
Jan 4, 2023 Our paper InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval is available on arxiv now.
Dec 12, 2022 Our paper In Defense of Cross-Encoders for Zero-Shot Retrieval is available on arxiv now.
Feb 22, 2022 Selected as an ICLR Highlighted Reviewer.
Feb 10, 2022 Our paper InPars: Data Augmentation for Information Retrieval using Large Language Models got accepted at SIGIR.
Jan 1, 2022 Our paper mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset is available on arxiv now.
Nov 10, 2020 I successfully defended my PhD dissertation! Check out my book here :book:
May 20, 2020 Our paper The Unreasonable Volatility of Neural Machine Translation Models got accepted at WNGT. It will be presented at ACL.
Oct 15, 2019 Joined Zeta Alpha Vector as NLP/ML research engineer.
Aug 13, 2018 Our paper Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation got accepted at EMNLP.
May 28, 2018 I did an internship at eBay CoreAI Machine Translation team during the summer.
Apr 20, 2018 I got invited to attend Deep Learning and Reinforcement Learning Summer School in Toronto, Canada.
Jan 30, 2018 Our paper Examining the Tip of the Iceberg: A Data Set for Idiom Translation got accepted at LREC.
May 30, 2017 Our paper Data Augmentation for Low-Resource Neural Machine Translation got accepted at ACL.
May 30, 2017 Our paper Learning Topic-Sensitive Word Representations got accepted at ACL.
Oct 22, 2015 I participated in Google’s NLP PhD Summit in Zürich, Switzerland.
Oct 15, 2014 I Participated in MT Marathon in Trento, Italy.