wip : Change gated model to tinyllama by arekay-nv · Pull Request #292 · mlcommons/endpoints

arekay-nv · 2026-04-22T21:44:50Z

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

github-actions · 2026-04-22T21:45:02Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request updates the default placeholder model to TinyLlama across various templates and scripts. It also introduces more robust handling for gated Hugging Face repositories by checking for access permissions before attempting to load tokenizers, allowing the system to fall back gracefully if access is denied. Review feedback suggests that the string-based error matching for gated repositories may be fragile and recommends ensuring that the huggingface_hub dependency is updated to at least version 0.23.0 to support the auth_check method.

gemini-code-assist · 2026-04-22T21:49:53Z

+        except (GatedRepoError, OSError) as e:
+            # transformers re-raises GatedRepoError as OSError containing "gated"
+            if isinstance(e, GatedRepoError) or "gated" in str(e).lower():
+                logger.warning(
+                    "Tokenizer '%s' is a gated HuggingFace repo and this "
+                    "environment has no access. Set HF_TOKEN or use an "
+                    "ungated model to enable token metrics. Continuing "
+                    "without token metrics (ISL/OSL/TPOT unavailable).",
+                    args.tokenizer,
+                )
+            else:
+                logger.warning(
+                    "Failed to load tokenizer '%s': %s. Continuing without "
+                    "token metrics (ISL/OSL/TPOT unavailable).",
+                    args.tokenizer,
+                    e,
+                )
+            pool_cm = nullcontext()


The error handling for TokenizePool initialization is robust, but the string matching on OSError (line 104) is slightly fragile as it depends on the internal error message format of the transformers library. While this is a known pattern for gated repo errors in transformers, consider if there's a more reliable way to detect this, or ensure that any other OSError that might occur during initialization is also handled correctly (which the else block currently does).

gemini-code-assist · 2026-04-22T21:49:53Z

+        if info.gated:
+            try:
+                HfApi().auth_check(repo_id=model_name)
+            except GatedRepoError:
+                logger.warning(
+                    f"Model '{model_name}' is a gated HuggingFace repo and "
+                    "this environment has no access. Set HF_TOKEN or use an "
+                    "ungated model to enable token metrics. Continuing "
+                    "without token metrics (ISL/OSL/TPOT unavailable)."
+                )
+                return False


The addition of HfApi().auth_check is a good proactive measure to verify access to gated repositories before launching the aggregator subprocess. However, note that auth_check was introduced in huggingface_hub version 0.23.0. If the environment uses an older version, this call will raise an AttributeError. While the general except Exception block at line 222 will catch this and allow the benchmark to continue without token metrics, it might be worth ensuring the project's dependencies are updated to reflect this requirement.

change gated model to tinyllama

5d2c52c

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv requested a review from a team as a code owner April 22, 2026 21:44

github-actions Bot requested a review from nvzhihanj April 22, 2026 21:45

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

viraatc approved these changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip : Change gated model to tinyllama#292

wip : Change gated model to tinyllama#292
arekay-nv wants to merge 1 commit intofeat/alicheng-pubsub-integrationfrom
arekay/fix-ci-tinyllama

arekay-nv commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arekay-nv commented Apr 22, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants