Need some independent evaluation metrics for this kind of agents. | discoverkit | discoverkit