ruanchaves

Follow

Ruan Chaves ruanchaves

Follow

Senior AI Engineer with 5+ years of experience delivering real-world solutions using Generative AI, LLMs, and NLP.

63 followers · 31 following

Achievements

Achievements

ruanchaves/README.md

👋 Hi, I'm Ruan

Senior AI Engineer | Generative AI | RAG | LLMs | NLP | Python

I build AI systems for retrieval, evaluation, automation, and applied NLP.

🌐 Personal Website · 📧 Email

📚 Research & Academic Projects

2023 — Napolab

The Natural Portuguese Language Benchmark.

Portuguese language model evaluation benchmark and dataset collection. A key contribution of this work is FaQuAD-NLI, which has been widely reused by the Portuguese NLP community, including in Portuguese LLM evaluation tooling and leaderboards.

2021 — hashformers

State-of-the-art research code for multilingual hashtag and word segmentation.

Hashformers uses language models and beam search to segment hashtags and whitespace-free text. The project was recognized as state-of-the-art for hashtag segmentation at LREC 2022 and has been cited and reused in multiple research papers.

Paper: Zero-shot hashtag segmentation for multilingual sentiment analysis

2022 — neuralmind-ai/coliee

Code for legal NLP research:

2020 — ruanchaves/BERT-WS

Code for:

Domain adaptation of transformers for English word segmentation

2020 — ruanchaves/assin

Code for:

Multilingual Transformer Ensembles for Portuguese Natural Language Tasks

2020 — ruanchaves/elmo

Code for:

Portuguese language models and word embeddings: evaluating on semantic similarity tasks

🌟 Open Source Contributions

Selected contributions to machine learning and NLP libraries.

2022 — argilla-io/argilla

Fixed bugs and shipped features related to semi-supervised learning during my internship at Argilla.

2021 — huggingface/transformers

Modified the Trainer class to support simultaneous Ray Tune and Weights & Biases execution.

2021 — awslabs/mlm-scoring

Improved installation instructions for the mlm-scoring library.

2020 — facebookresearch/BLINK

Fixed a parameter bug in a BLINK benchmark script.

2019 — nathanshartmann/portuguese_word_embeddings

Fixed a severe bug in the evaluation procedure.

The fix was documented in the paper “Portuguese language models and word embeddings: evaluating on semantic similarity tasks”.

📫 Contact

Website: ruanchaves.github.io
Email: ruanchaves93@gmail.com

Pinned Loading

napolab napolab Public

The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language ta…

Python 72 3
hashformers hashformers Public

Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.

Python 77 6
assin assin Public

Forked from erickrf/assin

Supporting code for the paper "Multilingual Transformer Ensembles for Portuguese Natural Language Tasks".

Jupyter Notebook 5 3
elmo elmo Public

Supporting code for the paper "Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks".

Jupyter Notebook 11 2
BERT-WS BERT-WS Public

Forked from jiangpinglei/BERT_ChineseWordSegment

Supporting code for the paper "Domain Adaptation of Transformers for English Word Segmentation".

Python
song2vec song2vec Public

Telegram bot that recommends songs as YouTube playlists through gensim's word2vec

Python 6