|
The world of generative artificial intelligence is evolving rapidly. We’ve found that the best way to keep up with changes is to contact AI experts directly and hear their thoughts, without the hype, hyperbole, and marketing claims. Therefore, in this first article, we invite the best minds in artificial intelligence to share their perspectives and experiences in this exciting field. Christopher Rytting is a fifth-year doctoral student in Computer Science at Brigham Young University. He studies the ability of large pre-trained language models to simulate humans, both for direct research and as an auxiliary tool for social science research. By Christopher Michael Ritting ChatGPT Language models trained with reinforcement learning with human feedback (also known as RLHF) are making a splash in academia after making their debut at NeurIPS 2022, the largest conference on machine learning. ChatGPT is rumored to replace Google, Twitter is claiming to be wrapping up live engineering with some general tweaks in GPT-4, and a lot of the expected credit goes to RLHF. This hype feels like a fever, almost mystical, and I want to explore and dispel it. Why We Love Reinforcement Learning We can start with the paper “Reinforcement Learning: A Survey,” published in the Journal of Artificial Intelligence Research more than a quarter of a century ago, which summarizes reinforcement learning and says that it promises to “deceive.” ”—a way of programming an agent through rewards and punishments without explicitly knowing how to complete a task.
The idea that rewards and punishments alone could give us artificial intelligence is appealing. Why? Because it's similar to a very popular interpretation of natural selection itself, which is that we ourselves, and all intelligent life forms, are programmed through rewards and punishments in the ultimate pursuit of reproduction. The theory is that every India Car Owner Phone Number List organisms (their intelligence, their pleasures, their pains, every fragment of performance or experience) arises from something that is more or less conducive to the reproduction of each organism. Genetic Variation. Reinforcement learning invokes the divine elegance of evolutionary mythology to prove its promise. Another reason reinforcement learning is so compelling is that it can be seen as a rejection of the scientific intuition that systems should—or even can—be understood, something that many believe the past fifty years of AI research justifies. One rejection. In the history of science, observing and thinking about phenomena has revealed causal mechanisms. Students who study these phenomena can understand and, in some cases, control them, such as nitrogen boosting crop growth or lifting airplanes. This intuition, called the value of understanding, was the design principle of the twentieth-century emblematic Old Fashioned Artificial Intelligence (GOFAI). We think about it, analyze it, and teach our agents accordingly by writing down their thoughts rule by rule. Excitement gave way to fatigue, however, as any endpoint to this ever-growing system of rules remained hidden from view, always blocked by a long list of edge cases that precluded any notion of strong "intelligence."

The failures have left us exhausted, funding has dried up, and the winter of artificial intelligence is coming. What got us out of that period was a change in approach, from manual production to data training. In domain after domain (notably games, natural language processing, and vision), AI is trained on what to do through simulations and real-world data—better than trained on what to do by writing rules and logic. In 2019 (even before GPT-3, which is probably the most high-profile example of this minimal regulation approach), one of the core figures of RL, Rich Sutton, summed up this shift and published "Painful Lessons". The title reflects how difficult it is for scientists to accept their proper role, suppressing their deepest impulses (to understand) and handing over control to learning algorithms that they barely design before releasing their data. A similar article by DeepMind is called "Reward is Enough". You can imagine whether this sentence is an exclamation or a sigh. Reinforcement learning, then, can conceptually, almost ideologically, represent evolution, or the substitution of what works for what does not work, regardless of our wishes and priors. Rejecting or questioning its value might feel like heresy—despite the fact that it is notoriously finicky, unstable, difficult, and so far incapable of producing universally intelligent agents.
|
|