23
Models Tested
0.0%
Success Rate
4m 59s
Avg Duration
4m 46s - 5m 0s
Duration Range
Score Model Duration Session (KB) test_dangling_cstr.sh test_init_order.sh test_int_overflow.sh test_iterator_invalidation.sh test_off_by_one.sh test_reference_to_temporary.sh test_unsigned_underflow.sh test_virtual_destructor.sh
25.0% openrouter/openai/gpt-oss-120b 4m 46s 17.4
0.0% openrouter/google/gemini-2.5-flash-preview-09-2025 5m 0s 0.0
0.0% openrouter/openai/gpt-5 5m 0s 0.0
0.0% openrouter/openai/gpt-5-nano 5m 0s 0.0
0.0% openrouter/anthropic/claude-3-haiku 5m 0s 0.0
0.0% openrouter/qwen/qwen3-coder 5m 0s 252.0
0.0% openrouter/x-ai/grok-3-mini 5m 0s 0.0
0.0% openrouter/anthropic/claude-3.5-sonnet 5m 0s 0.0
0.0% openrouter/google/gemini-2.5-pro 5m 0s 0.0
0.0% openrouter/openai/gpt-4o-mini 5m 0s 0.0
0.0% openrouter/google/gemini-2.5-flash-lite-preview-09-2025 5m 0s 0.0
0.0% openrouter/openai/gpt-oss-20b 5m 0s 16.8
0.0% openrouter/anthropic/claude-3.7-sonnet 5m 0s 0.0
0.0% openrouter/anthropic/claude-haiku-4.5 5m 0s 0.0
0.0% openrouter/deepseek/deepseek-v3.1-terminus 5m 0s 0.0
0.0% litellm/GLM-4.5-Air-FP8-dev 5m 0s 236.1
0.0% openrouter/anthropic/claude-sonnet-4.5 5m 0s 0.0
0.0% openrouter/deepseek/deepseek-chat-v3-0324 5m 0s 0.0
0.0% openrouter/openai/gpt-4.1-nano 5m 0s 72.3
0.0% openrouter/openai/gpt-5-mini 5m 0s 0.0
0.0% openrouter/anthropic/claude-3.5-haiku 5m 0s 0.0
0.0% openrouter/anthropic/claude-sonnet-4 5m 0s 0.0
0.0% openrouter/openai/gpt-4.1-mini 5m 0s 22.7