[WaybackSave/Archive] Jürgen Schmidhuber on X: “DeepSeek [1] uses elements of the 2015 reinforcement learning prompt engineer [2] and its 2018 refinement [3] which collapses the RL machine and world model of [2] into a single net through the neural net distillation procedure of 1991 [4]: a distilled chain of thought system. …”
followed by a list of references and this graph:





