Love the confidence scores idea. could you explain in detail what factors are considered (timing, retries, assertions)? also, can a user click to drill down and see why a test was tagged flaky vs bug?