As Large Language Models (LLMs) increasingly accommodate larger inputs, context windows spanning hundreds of thousands or even millions of tokens are touted as promising for a wide array of applications. However, the potential decay in reasoning ability for larger inputs may compromise their utility. This study introduces a new benchmark called Find the Origin, which progressively tests the efficacy of LLMs on a simple intellectual task as the size of the context window increases. The test, conducted on 14 different LLMs for comparative analysis, demonstrated that reasoning ability is dependent on input size. Additionally, three independent tests were performed with the GPT-4 Turbo model to demonstrate its reasoning degradation in different contexts as input size increases.