Search on Complex Landscapes

Posted on Wed 29 April 2026 in Research

Consider an agent engaged in local search and learning over a complex landscape. The search process occurs sequentially over discrete periods. In each period $i\in\{1,2,\dots,n\}$, the agent pays a marginal cost $c>0$ to learn about their local environment and draws a value $X_{i}$. These draws are assumed to be independent and follow a Gumbel distribution with a shared scale parameter $b>0$, which captures the structural variance or “noise” inherent to the landscape.

As the agent exerts effort over time, they learn about the landscape, which positively shifts the expected value of their draws. We define the location parameter $m_{i}$ for draw $X_{i}$ as:

$$\begin{equation} m_{i}=K+\frac{1}{\psi}\sum^{i}_{k=1}f(k) \end{equation}$$

where $K$ represents the baseline value of the landscape. The parameter $\psi>0$ explicitly captures landscape complexity. As complexity increases, the agent’s learning efficiency ($1/\psi$) decreases. The function $f(k)$ governs the marginal impact of learning effort in period $k$. Assuming diminishing marginal returns to local learning, we impose a harmonic decay rate, $f(k)=\frac{1}{k}$.

The agent’s gross payoff after searching for $n$ periods is determined by the best outcome discovered during the search process: $M_{n}=\max\{X_{1},X_{2},\dots,X_{n}\}$. As the draws are independent and Gumbel-distributed, the expected value of the maximum is defined by the log-sum-exp function of the individual location parameters:

$$\begin{equation} \mathbb{E}[M_{n}]=b\ln\left(\sum^{n}_{i=1}\exp\left(\frac{m_{i}}{b}\right)\right)+b\gamma \end{equation}$$

where $\gamma$ is the Euler-Mascheroni constant.

To evaluate this finite sum analytically, we utilize the asymptotic properties of the learning curve. For a sufficiently large number of search periods, the cumulative learning effort approximates the natural logarithm, $\sum^{i}_{k=1}\frac{1}{k}\approx\ln(i)$. The location parameter for period $i$ can therefore be closely approximated as $m_{i}\approx K+\frac{1}{\psi}\ln(i)$.

Substituting this continuous approximation into the inner sum yields:

$$\begin{equation} \sum^{n}_{i=1}\exp\left(\frac{K+\frac{1}{\psi}\ln(i)}{b}\right)=e^{K/b}\sum^{n}_{i=1}i^{\frac{1}{b\psi}} \end{equation}$$

Pivoting from the discrete summation to an integral approximation captures the cumulative area under the learning curve:

$$\begin{equation} \sum^{n}_{i=1}i^{\frac{1}{b\psi}}\approx\int^{n}_{1}x^{\frac{1}{b\psi}}dx=\frac{n^{\frac{1}{b\psi}+1}}{\frac{1}{b\psi}+1}=\frac{n^{\frac{1+b\psi}{b\psi}}}{\frac{1+b\psi}{b\psi}} \end{equation}$$

Substituting the integral approximation back into the expected maximum framework collapses the analytical complexity, yielding a highly tractable, linearly separable equation for the gross value of search:

$$\begin{equation} \mathbb{E}[M_{n}]\approx K+\left(\frac{1}{\psi}+b\right)\ln(n)-b\ln\left(\frac{1+b\psi}{b\psi}\right)+b\gamma \end{equation}$$

This structural form cleanly isolates the marginal benefit of search depth. The marginal gross value of an additional period of local search and learning is exactly:

$$\begin{equation} \frac{\partial\mathbb{E}[M_{n}]}{\partial n}=\frac{\frac{1}{\psi}+b}{n} \end{equation}$$

This derivative formalizes the central economic intuition regarding complexity. The marginal benefit of search is strictly decreasing in landscape complexity $\psi$. As the environment becomes more complex, learning is stifled, and the value of continued search is driven purely toward the baseline noise parameter $b$:

$$\begin{equation} \frac{\partial^{2}\mathbb{E}[M_{n}]}{\partial n\partial\psi}=-\frac{1}{\psi^{2}n}<0 \end{equation}$$

The agent endogenously chooses their optimal search depth $n^{*}$ by equating the marginal benefit of learning to the marginal per-period cost $c$. Setting $\frac{\partial\mathbb{E}[M_{n}]}{\partial n}=c$ and rearranging provides a closed-form optimal stopping rule:

$$\begin{equation} n^{*}=\frac{\frac{1}{\psi}+b}{c}=\frac{1+b\psi}{c\psi} \end{equation}$$

The optimal duration of local search is therefore a direct function of the environmental variance and the landscape complexity. Higher complexity $\psi$ suppresses the learning rate, optimally inducing the agent to halt their search earlier.