| Category | Cases | Last run |
|---|---|---|
| startupSmall commands where interpreter startup dominates runtime. | 4 | 0.053 msbash median: 1.662 ms |
| stringsString expansion, pattern handling, and text manipulation. | 8 | 0.057 msbash median: 1.791 ms |
| variablesVariable assignment, lookup, expansion, and environment handling. | 8 | 0.058 msbash median: 1.688 ms |
| arraysIndexed array reads, writes, expansion, and iteration. | 6 | 0.059 msbash median: 1.713 ms |
| subshellCommand substitution and nested shell execution paths. | 6 | 0.061 msbash median: 3.143 ms |
| arithmeticInteger math, substitutions, and expression-heavy shell snippets. | 6 | 0.062 msbash median: 1.703 ms |
| pipesPipeline construction, streaming, and command chaining. | 6 | 0.065 msbash median: 3.131 ms |
| controlConditionals, loops, case statements, and branching scripts. | 9 | 0.076 msbash median: 1.711 ms |
Benches
Latest benchmark snapshot
Static aggregate generated from repository result artifacts. Use the linked files for raw measurements and full eval traces.
| Category | Passed | Pass rate |
|---|---|---|
| system_info | 1/2 | 50%tasks passed |
| file_operations | 3/4 | 66.7%tasks passed |
| scripting | 5/7 | 68.6%tasks passed |
| json_processing | 8/8 | 100%tasks passed |
| data_transformation | 6/6 | 100%tasks passed |
| complex_tasks | 6/6 | 100%tasks passed |
| text_processing | 6/6 | 100%tasks passed |
| pipelines | 5/5 | 100%tasks passed |
| Run | Score | Tools |
|---|---|---|
| gpt-5.3-codex2026-05-26 | 93% | 86.8% |
| gpt-5.52026-05-26 | 92.7% | 91.5% |
| claude-opus-4-72026-05-26 | 97.8% | 90.3% |
| claude-sonnet-4-62026-05-26 | 94% | 91% |
| claude-haiku-4-5-202510012026-05-26 | 98.4% | 92.3% |
| claude-sonnet-4-62026-02-28 | 92.5% | 85.1% |
Indexes