Hi folks,
I'm trying to reproduce the Gemini 2.5 pro's results locally, and want to clarify on a few things
- What's the difference between
Gemini-2.5-Pro-06-05 and Gemini-2.5-Pro-05-06? The pass@1 differed a lot (74.2 vs 68.1 on 01/01/2025-05/01/2025).
- Is there an explicit list of model hyperparams that are used per submission (top_p, temperature, max_tokens, etc.)? I could only find top_p=0.95 and temperature=0.2 from the paper but not sure about the submission's setup.
- Could model evolvement during the pass 5 months (from May to now) impact the benchmark results (edited, noticed a bug for previous statement)
- In general, what's the best strategy to reproduce the leaderboard results on Gemini and other models?
Thanks in advance for your help!
Hi folks,
I'm trying to reproduce the Gemini 2.5 pro's results locally, and want to clarify on a few things
Gemini-2.5-Pro-06-05andGemini-2.5-Pro-05-06? The pass@1 differed a lot (74.2 vs 68.1 on 01/01/2025-05/01/2025).Thanks in advance for your help!