The 95% Accuracy Trap: Why Multi-Step AI Agents Fail | PromptMetrics