Human Pairwise Evaluation

Evaluator setup

Enter your voter name

Model names are hidden while voting. This browser will also get a private local ID so we can count unique evaluators.

Protected Results

Enter the results password

Voting is public, but model rankings and exports are restricted.

Evaluator setup

Enter your voter name before starting.

Model names are hidden during voting. Your voter name is only used to avoid showing the same pair twice to the same evaluator.

Results

Elo is used internally for ranking but not shown directly to voters.

Rank Model Games W L T Count labels