Skip to content

compling-wat/Chartographer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models

Authors: Yifan Jiang, Dae Yon Hwang, Jesse C. Cresswell, Freda Shi

📄 Paper | 🤗 Dataset

Overview

Chartographer is a counterfactual chart generation pipeline for evaluating whether vision-language models answer chart questions through visual reasoning rather than shortcuts or prior familiarity with a chart.

It converts chart QA examples into counterfactual chart-question families: the original chart, a base reconstruction, and seed-controlled counterfactual variants whose answers are recomputed with executable QA logic.

Chartographer pipeline overview

Installation

Python 3.10+ is recommended.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Set API keys for the providers you plan to use:

export OPENAI_API_KEY=your_api_key_here
export ANTHROPIC_API_KEY=your_api_key_here

For local Hugging Face VLMs, install any hardware-specific packages separately. To prefer local model weights, set:

export CHARTOGRAPHER_MODEL_WEIGHTS_DIR=/path/to/model-weights

Configure Data

Datasets are configured with a JSON file. Set CHARTOGRAPHER_DATASETS_FILE before running Chartographer. See examples/datasets.example.json for the datasets used in the paper and templates for custom datasets.

Minimal local dataset config:

{
  "datasets": {
    "my_dataset": {
      "local_file_template": "{repo_root}/data/my_dataset/{split}.json",
      "local_dir": "my_dataset",
      "question_col": "question",
      "image_col": "image",
      "answer_col": "answer"
    }
  }
}
export CHARTOGRAPHER_DATASETS_FILE=/path/to/datasets.json

For datasets with reconstruction or counterfactual variants, set variant_col and family_id_col in the config. Datasets exported by Chartographer include those fields automatically.

Run The Pipeline

Use the make targets for the standard workflow. RECONSTRUCTION_MODEL is used for chart reconstruction and QA regeneration. PREDICTION_MODEL is the VLM being evaluated. JUDGE_MODEL checks prediction correctness.

make reconstruction-workflow DATASET=my_dataset SPLIT=dev RECONSTRUCTION_MODEL=reconstruction-model REVISION_ROUNDS=2
make qa-workflow DATASET=my_dataset SPLIT=dev RECONSTRUCTION_MODEL=reconstruction-model SEED=0
make seed-workflow DATASET=my_dataset SPLIT=dev SEED_START=0 SEED_END=9

REVISION_ROUNDS=N is optional on reconstruction-workflow. It runs N self-refinement turns: diagnose the current render, revise the reconstruction, and render it again. The last revision becomes the active reconstruction, temporary revision files are cleaned, and the files needed by QA/export are rebuilt from the promoted reconstruction.

Export the original chart, base reconstruction, and seed-controlled counterfactual variants as a local evaluation dataset:

make export-family-dataset DATASET=my_dataset SPLIT=dev OUTPUT_DATASET=my_dataset_families FAMILY_SEEDS=0-9
export CHARTOGRAPHER_DATASETS_FILE=$PWD/data/my_dataset_families/datasets.json

Run prediction and evaluation on the exported family dataset:

make prediction-workflow DATASET=my_dataset_families SPLIT=dev PREDICTION_MODEL=prediction-model JUDGE_MODEL=judge-model

Use make help for individual steps. See docs/workflow.md for direct python -m ... commands and output locations.

Repository Layout

src/clients/                  API and local VLM clients
src/common/                   dataset, answer, and prediction I/O utilities
src/config/                   model aliases and task prompts
src/pipeline/reconstruction/  chart reconstruction and counterfactual rendering
src/pipeline/qa/              QA regeneration and execution
src/pipeline/datasets/        chart-question family dataset export
src/pipeline/prediction/      VLM prediction, evaluation, and visualization

Generated outputs are written under results/; local generated datasets are written under data/.

Citation

If you use Chartographer, please cite the paper:

@misc{jiang2026chartographer,
  title={Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models},
  author={Yifan Jiang and Dae Yon Hwang and Jesse C. Cresswell and Freda Shi},
  year={2026},
  eprint={2605.27311},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2605.27311}
}

License

See LICENSE.

About

A counterfactual chart generation pipeline for evaluating chart reasoning in vision-language models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors