-
Notifications
You must be signed in to change notification settings - Fork 10
feat(python-sdk): contract test scaffold and conventionality contract test #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fsisenda/sdk_python_basic_conventionality
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,320 @@ | ||||||||
| """Contract test capture utilities for evaluator notebooks. | ||||||||
|
|
||||||||
| Three-step workflow for notebook authors | ||||||||
| ----------------------------------------- | ||||||||
| 1. Wrap every model in your chain with ``capture_llm()``: | ||||||||
|
|
||||||||
| chain = prompt | capture_llm("step_name", my_model) | JsonOutputParser() | ||||||||
|
|
||||||||
| The prefix (``"step_name"``) becomes the step key in ``prompt_steps`` in | ||||||||
| the TOML output. Use a short, stable name per step (e.g. ``"main"``, | ||||||||
| ``"bk"``, ``"vocab"``). | ||||||||
|
|
||||||||
| 2. Immediately after each test-case evaluation, call ``capture_case()`` to save a | ||||||||
| point-in-time copy of what was captured. Pass the evaluator's input dict | ||||||||
| and output dict directly — no manual field extraction needed:: | ||||||||
|
|
||||||||
| case_input = {"text": my_text, "grade_level": 4} | ||||||||
| case_output = run_evaluator(**case_input) | ||||||||
|
|
||||||||
| _cap = capture_case( | ||||||||
| name="my_case", | ||||||||
| input=case_input, | ||||||||
| llm_call_captures=["step_name"], # prefixes, in call order | ||||||||
| expected_result=case_output, | ||||||||
| description="…", # optional human-readable label | ||||||||
| ) | ||||||||
|
|
||||||||
| 3. Print the TOML block and paste it into ``contract_tests.toml``: | ||||||||
|
||||||||
| 3. Print the TOML block and paste it into ``contract_tests.toml``: | |
| 3. Print the TOML block and paste it into ``contracts.toml`` (for example, | |
| ``sdks/settings/<evaluator>/contracts.toml``): |
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples in the docstring use an input key grade_level, but the Conventionality evaluator input schema uses grade (and the contract TOML in this PR uses grade). Consider updating the examples to match the actual evaluator API so notebook authors don’t capture mismatched input shapes.
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_contract_toml()’s docstring also says the output should be pasted into contract_tests.toml, but the contract artifacts in this repo are named contracts.toml. Aligning the docstring with the actual file name will reduce regeneration errors.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,7 +54,8 @@ | |
| "from langchain_core.prompts.chat import HumanMessagePromptTemplate\n", | ||
| "from langchain_google_genai import ChatGoogleGenerativeAI\n", | ||
| "from pydantic import BaseModel, Field\n", | ||
| "from textstat import textstat as ts" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P0 - Are these changes necessary for the first release scope? Will these need to be applied to all the notebooks? |
||
| "from textstat import textstat as ts\n", | ||
| "from capture import capture_llm, capture_case, reset_captures, build_contract_toml" | ||
| ] | ||
| }, | ||
| { | ||
|
|
@@ -172,7 +173,7 @@ | |
| " },\n", | ||
| " )\n", | ||
| "\n", | ||
| " chain = prompt | model | JsonOutputParser()\n", | ||
| " chain = prompt | capture_llm(\"main\", model) | JsonOutputParser()\n", | ||
| " return chain.invoke(dataset)" | ||
| ] | ||
| }, | ||
|
|
@@ -201,6 +202,32 @@ | |
| "display(result)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "fbbe4aa9", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "reset_captures()\n", | ||
| "\n", | ||
| "sample_text = \"\"\"\n", | ||
| "\"Well, then,\" said the teacher, \"you may take your slate and go out behind the schoolhouse for half an hour. Think of something to write about, and write the word on your slate. Then try to tell what it is, what it is like, what it is good for, and what is done with it. That is the way to write a composition.\" Henry took his slate and went out. Just behind the schoolhouse was Mr. Finney's barn. Quite close to the barn was a garden. And in the garden, Henry saw a turnip. \"Well, I know what that is,\" he said to himself; and he wrote the word turnip on his slate. Then he tried to tell what it was like, what it was good for, and what was done with it. Before the half hour was ended he had written a very neat composition on his slate. He then went into the house, and waited while the teacher read it. The teacher was surprised and pleased. He said, \"Henry Longfellow, you have done very well. Today you may stand up before the school and read what you have written about the turnip.\"\n", | ||
| "\"\"\"\n", | ||
| "input = {\"text\": sample_text, \"grade\": 4}\n", | ||
| "result = predict_text_complexity_level(**input)\n", | ||
| "\n", | ||
| "capture = capture_case(\n", | ||
| " name=\"turnip\",\n", | ||
| " description=\"Grade 4 classroom narrative (Henry and the turnip)\",\n", | ||
| " input=input,\n", | ||
| " llm_call_captures=[\"main\"],\n", | ||
| " expected_result=result,\n", | ||
| ")\n", | ||
| "\n", | ||
| "print(build_contract_toml(capture))" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "cell-12", | ||
|
|
@@ -212,13 +239,21 @@ | |
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3", | ||
| "display_name": ".venv (3.14.4)", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "version": "3.10.0" | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.14.4" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0 - As discussed, move this into the sdk / python, for now. Eventually parts of this will be supplemented / replaced by our updated notebook