How can you leave an agent autonomous for 3 hours without even checking? Need some independent evaluation metrics for this kind of agents. At this moment everything feels like hype.