Experimental data suggest that neural circuits configure their synaptic connectivity for a given computational task. They also point to dopamine-gated stochastic spine dynamics as an important underlying mechanism, and they show that the stochastic component of synaptic plasticity is surprisingly strong. We propose a model that elucidates how task-dependent self-configuration of neural circuits can emerge through these mechanisms. The Fokker-Planck equation allows us to relate local stochastic processes at synapses to the stationary distribution of network configurations, and thereby to computational properties of the network. This framework suggests a new model for reward-gated network plasticity, where one replaces the common policy gradient paradigm by continuously ongoing stochastic policy search (sampling) from a posterior distribution of network configurations. This posterior integrates priors that encode for example previously attained knowledge and structural constraints. This model can explain the experimentally found capability of neural circuits to configure themselves for a given task, and to compensate automatically for changes in the network or task. We also show that experimental data on dopamine-modulated spine dynamics can be modeled within this theoretical framework, and that a strong stochastic component of synaptic plasticity is essential for its performance.