Skip to content
Paul LuckeyProduct Architect

MCQ Generator

Two-tier quality detection · Goodhart-aware

Generate a multiple-choice question from a concept description. Three bias analysts and one validity grader check the result. The Goodhart flag fires when the bias metrics say healthy but the validity grader says broken — the failure mode automated quality systems miss.

Try it

0 / 5000
Detecting tier…
Quick fill:

Example output (no generation needed)

Comparative Advantage · production-rewrite · claude-opus-4-7

In one hour, Ana can either write 10 marketing emails or design 2 web pages. In the same hour, her assistant Ben can either write 4 marketing emails or design 1 web page. Ana's startup needs both done. How should they divide the work to maximize total output, and why?

  1. A.Ana should focus on emails and Ben on web pages, because Ana writes more emails per hour (10 vs. 4) — that productivity gap is larger than her web-page gap (2 vs. 1), so emails are where she's most clearly ahead.
  2. B.Ben should focus on emails and Ana on web pages, because Ana gives up 5 emails per web page while Ben gives up only 4, so assigning the cheaper email-producer to emails minimizes forgone design work.✓ correct
  3. C.Ana should do both tasks herself, since she produces more emails per hour and more web pages per hour than Ben — bringing Ben in on either task reduces total output.
  4. D.Ben should focus on web pages and Ana on emails, because Ben can only produce 1 web page per hour while Ana can produce 2, so assigning the slower worker to the slower-output task balances the workload.

Strategy: production-rewrite. Pipeline: planning → generation → self-eval → length-matching rewrite of the correct option → analyst grading. Free tier (anonymous) runs on Haiku 4.5; pro tier (signed in) runs on Sonnet 4.5. Source: github.com/pluckey/mcq-generator.