Distributed Artificial Intelligence
Many amateur "AI professionals" are "philosophers" who ignore as much experience and data as necessary in order to make lofty statements. This impresses the rubes but does not translate into hardware. Real AI is optimized for physics, not fact-free philosophy. Sine data, cogito ergo sum insipiens.
Superfast large core AI will not happen because that approach is costly and suboptimal. Distributed AI, many nodes including humans and other lifeforms, is happening now.
Human brains are distributed low power multiprocessors relying heavily on pattern matching. Brains move atoms between synapses, not electrons (though they distribute action potentials across neurons at electronic speeds). The brain does all its work, including physiological maintenance, with 20 watts of power, and does so at a results-generation rate that matches the problems encountered in the natural environment. Brains could work faster, but energy is costly to gather, and waste is difficult to dissipate.
There is no reason to think that engineers (human or artificial) will design AI in excess of productive need or available resources. The productive need is vast, perhaps unbounded, but it is situational and specific and spread over a vast area, not concentrated in one spot. AI will permit us to address many problems that are too quick or small to tackle with human brains, but the biggest problems demand surface area solutions, not volume solutions.
Consider cooling a system in space, moving heat in a finite-speed coolant media. Cooling output is proportional to radiator area. If the heat source is concentrated at a point, then it must be carried outwards to the edges of the radiator. Double the edge size of the radiator, and you double the coolant round trip time. Each unit of coolant moves half as much heat per unit time, so you need twice as much per watt radiated. Multiply the heat source by four, multiply the radiator area by four, multiply the coolant (and piping mass, and pumping power) by eight. At some point, you are spending more mass on coolant and more energy on moving it than you are spending on the primary task. If the primary task is also massive and inefficient, it can mask a bigger coolant system cost. If the primary task is efficient, it will be scaled and distributed to small and efficient radiators.
Problems like rendering display frames are calculation, not artificial intelligence, and they are done today with hundreds of thousands of parallel threads. But the customers (humans and their visual systems) need no more than 200 frames a second, because the human visual system doesn't trigger on display artifacts smaller than 5 milliseconds. The computational task of preparing a high resolution display, can be efficiently divided into hundreds of thousands of threads with current gigahertz-capable but 100-megahertz-optimal parallel hardware. As Moore's law provides more hardware, the threads will increase and the thread clock rate will drop, someday devolving to a 200 hertz thread per pixel. Distribution and efficient information flow will dominate. While those threads will probably still be instantiated on a substrate smaller than a thumbnail, a millisecond of information scatter and gather at half the speed of light permits a substrate 75 kilometers across, with heat sources distributed throughout.
The biggest extant AI, the worldwide network of Google computers, handles a huge parallel task, serving millions of simultaneous queries. Google uses a crapton of parallel computers at multiple sites to do that. Using fewer, faster computers is more expensive and less energy efficient.
Speed is proportional to CV (capacitance times voltage), power is proportional to CV²F, so increasing C (parallelism) and reducing V is a win, more results per watt-hour. Nature knows this, Intel knows this, nVidia knows this, Google knows this, I know this. The millenialist AI community does not.
Google response time is a matter of two things - network delay, and how many canned responses they've stored up for generic search queries. Typically, each customer-facing compute node pulls the search words deemed "important" out of a search query, ignores the rest, assembles a pointer, then does one disk lookup on the machine the pointer points to, returning the results of a generic search that was performed hours or weeks or years ago. Along with the revenue-generating ads that attach to the search words. That approach is fast and cheap, and works like a human brain. Also like interacting with the typical brain, annoying to those of us who are trying to find exceptional results. Still, Google's customers (the advertisers) get access to the product they want (you and me), and this is a very efficient way to cost-effectively harvest high quality product for the customers.
Given speed-of-light network delays, there is no reason to spend gigabucks to further reduce the response time of the computers. It is better to create more and smaller Google data centers, closer to the product AKA search users, so search users can be harvested faster and more efficiently. There is no need to build a Google data center bigger than the product field it harvests. Google's data center in The Dalles, Oregon, is an exception; electricity is so cheap in Oregon that many of the backroom tasks, like efficiently sorting the internet into bins and assigning search pointers to them, is best done where energy is cheap. Then those bins are replicated to production data centers around the world. Of course, those distributed data centers can also assist with bin creation during times of low product demand.
Google, like nature, puts all its eggs in MANY baskets. Take away three Google data centers, and with a little bit of routing magic, the rest will shoulder the load, working as a group, perhaps a little slower because the speed of light delays to some customers is larger. Four smaller nodes in a densely interconnected region will have half the latency of one big node, and use half the total routing resources, and provide faster response if one of the nodes is lost. If all four are lost, there is probably infrastructure damage throughout in the region, so the reason to respond is probably lost, too.
The unit of genetic reproduction is the species, not the individual; a too-small subset of a species is nonviable (estimated to be greater than 500 individual humans for sufficient genetic diversity and accident tolerance). Individuals are optimized for selection; trash bags for bad genes. Individual speed is optimized for the speed of the threats and opportunities in the environment. Large and fast threats are few, because they are costly, and limited by physics. The best way to respond to rare and expensive threats is more individuals, not a few almost-invulnerable individuals, especially if the individuals can collaborate to remove the source of the threats.
Individuals present more "vulnerable surface" to the environment, but they also can collect more resources (food and information) through that larger surface. If the individuals can differentiate (like learning human beings), then each individual has different vulnerabilities; the chance that one threat can take out all of the individuals is much smaller than one threat taking out one large individual. An elephant can menace one small individual, but falls prey to a coordinated band of individuals smaller in total mass. Coordination beats concentration.
AI, like the human brain, is a tool to solve a problem with the time and resources available. The best AI computes plausible solutions in advance of need, as power-efficiently as possible, then refines them for specific situations as they occur. Lots of solutions efficiently manufactured in parallel, distributed in time to match the rate of problem manifestation, is the optimum way to proceed. For problems manifesting at 100 kilosecond (1.16 day) rates, sense-and-respond solution systems 100 AU in diameter are adequate, and emergent physical threats (or opportunities) at sublight velocities will need much longer to affect those large and networked systems.