Skip to content
View ruanchaves's full-sized avatar

Block or report ruanchaves

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ruanchaves/README.md

πŸ‘‹ Hi, I'm Ruan

Senior AI Engineer | Generative AI | RAG | LLMs | NLP | Python

I build AI systems for retrieval, evaluation, automation, and applied NLP.


πŸ“š Research & Academic Projects

2023 β€” Napolab

The Natural Portuguese Language Benchmark.

Portuguese language model evaluation benchmark and dataset collection. A key contribution of this work is FaQuAD-NLI, which has been widely reused by the Portuguese NLP community, including in Portuguese LLM evaluation tooling and leaderboards.

Napolab Leaderboard Interface Model Performance Analysis

2021 β€” hashformers

State-of-the-art research code for multilingual hashtag and word segmentation.

Hashformers uses language models and beam search to segment hashtags and whitespace-free text. The project was recognized as state-of-the-art for hashtag segmentation at LREC 2022 and has been cited and reused in multiple research papers.


Code for legal NLP research:


2020 β€” ruanchaves/BERT-WS

Code for:


2020 β€” ruanchaves/assin

Code for:


2020 β€” ruanchaves/elmo

Code for:


🌟 Open Source Contributions

Selected contributions to machine learning and NLP libraries.

2022 β€” argilla-io/argilla

Fixed bugs and shipped features related to semi-supervised learning during my internship at Argilla.


Modified the Trainer class to support simultaneous Ray Tune and Weights & Biases execution.


Improved installation instructions for the mlm-scoring library.


Fixed a parameter bug in a BLINK benchmark script.


Fixed a severe bug in the evaluation procedure.

The fix was documented in the paper β€œPortuguese language models and word embeddings: evaluating on semantic similarity tasks”.


πŸ“« Contact

Pinned Loading

  1. napolab napolab Public

    The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language ta…

    Python 72 3

  2. hashformers hashformers Public

    Accurate word segmentation for hashtags and text, powered by Transformers and Beam Search. A scalable alternative to heuristic splitters and massive LLMs.

    Python 77 6

  3. assin assin Public

    Forked from erickrf/assin

    Supporting code for the paper "Multilingual Transformer Ensembles for Portuguese Natural Language Tasks".

    Jupyter Notebook 5 3

  4. elmo elmo Public

    Supporting code for the paper "Portuguese Language Models and Word Embeddings: Evaluating on Semantic Similarity Tasks".

    Jupyter Notebook 11 2

  5. BERT-WS BERT-WS Public

    Forked from jiangpinglei/BERT_ChineseWordSegment

    Supporting code for the paper "Domain Adaptation of Transformers for English Word Segmentation".

    Python

  6. song2vec song2vec Public

    Telegram bot that recommends songs as YouTube playlists through gensim's word2vec

    Python 6