There are some aspects of AI research that commonsense struggles to realize. Even among trained IT experts. Not because of a lack of information, education, or technical knowledge, but because we are still a species that doesn’t know itself. We tend to externalize our conscious awareness and miss the internal psychological and cognitive mechanisms that determine our cognition. We tend to study, analyze, and know everything out of us but know almost nothing about ourselves and how we know. This also is a consequence of a materialistic culture that—due to the enormous success of science, and which is the emperor of the third-person perspective—we envisage huge technological and material adventures that are supposed to lead us to a bright future and sci-fi society. But we can’t imagine any inner progress. We don’t even know what this could possibly mean. That’s why we don’t know how our cognitive processes work, and are so thrilled by all the AI hype and transhumanist dreams.
If we could look only a bit under our superficial mental bubbling surface, we would recognize how so many AI futuristic visions are naive expectations that betray the lack of contact with our inner being.
There are many different approaches with which one could show this. For example, I like to point out how ‘meaning’, the semantic and conceptual comprehension of a sentence, a concept, the perception of an object, etc. is not in the symbol, the sentences on a piece of paper, in an image, in the visual perception of an object, but is a much subtler cognitive process that goes way beyond an algorithm, a computer program, or a whatever complicated calculation. The semantic awareness of the world we perceive, what I call the “perception of meaning,” is not and will never be implemented in a set of rules or a complex system of neural networks. Because, this perception of meaning is something that goes beyond computation and, I claim, beyond brain processes.
For example, when you look at an object, say a chair, your mind never sees the chair as an isolated object. Even if you picture it in your mind floating in zero-gravity outer space, you never conceive it separately from a huge context made of logical and conceptual relationships that your mind stored in another huge memory reservoir from past lived experiences. Now AI research has found that machines’ semantic maps are too poor because trained on too much text, instead of focusing on the world “out there.” The next step is to implement real world-knowledge—that is, the conceptual and relational knowledge between objects and the external environment. Many believe this will be the key to human-like artificial general intelligence (AGI). But it won’t because it logically ends in an infinite regression (such as: a chair is a thing on which people sit, it has four legs because otherwise it tilts, it is made of some substance, say, wood, and wood is something you get from trees, and trees are plants that are part of a forest, and forests are complex ecological systems made of many trees, ecological systems are part of the biosphere, the biosphere is….) How does the mind get out of this infinite regression?
Of course, if you ask ChatGPT, it may tell you all that already, as if it has a semantic awareness as we have. However, its responses aren’t based on its own world-knowledge experience, but are based on someone’s else experiences encoded into, signs, symbols, letters, and sentences, namely on already human-given patterns and information present in the data on which it was trained. When you input "chair," it draws upon its (mostly textual) training data, which includes a diverse range of text from the internet—that is, words in numerical representations, typically high-dimensional vectors that capture the contextual information and relationships in the form of numbers between words. There is no “concept” of a chair in the human sense, only numerical patterns of language and information associated with chairs, including their typical features, uses materials, and various contexts in which they are mentioned. There is no visual experience and perception or imagination, let alone consciousness, but only learned patterns in the data. At best, it is all about syntax, not semantics.
Now, you might say that, perhaps, that’s what we are unconsciously doing with our brains as well. But everyone who has minimal experience with practicing meditation and could at least partially silence one’s own mind chatter, knows all too well how conceptual and semantic clarity emerges from less intellectual activity, not more of it. The ‘emergence of meaning’ in our field of awareness does not emerge from a process, but to the contrary, from the absence of mental processes and reveals itself as an inherent knowledge that was already present before we even asked the question. In other words, meaning emerges from elsewhere where it was already present.
That’s why self-driving cars are still not driving us around while we can take a nap in the backseat. For years we were promised that the bright future of self-driving cars was behind the corner. But the expectations have now started lowering. It was easily predictable that they wouldn’t become an everyday reality. But it becomes easy, if and only if, we look inward. If we step back and silence the mind and, at the same time, see how the mind perceives and understands the world. We then realize that driving a car is not just an automatic algorithm running in our heads, even though there are a lot of automatic actions that we execute unconsciously. But the point is that you can’t drive a car if you do not have a subjective perception of meaning, with the world and all its objects triggering a subjective semantic experience that is something intrinsically different than a picture on your retina and a sterile numerical classification scheme. Nobody will be able to drive safely without understanding the meaning, significance, and context of the objects passing by while driving through a landscape. Even in the most advanced AI system, despite being quite impressive in simulating comprehension, there is no ‘ghost in the machine’ having a semantic awareness of what it sees, computes, registers, or number crunches. When you perceive the meaning, the semantics of something, there is in you an inner flash of realization, a spark of knowledge, sometimes the famous ‘aha moment’ that a machine will never have.
Of course, here I could jump straight into metaphysics, and write about the metaphysics of the philosophy of language that shows how semantics is apprehension and comprehension of something that goes beyond the body and even beyond the mind and is instantiated on higher trans-physical planes of existence, as I have discussed in the essays on the philosophy of language of Abhinavagupta and Sri Aurobindo (here, and here.) Or I could point out that there is now clear empirical neuroscientific evidence of the missing mind-brain identity that indicates how the mind might be immaterial rather than a material brain epiphenomenon. If dualism is true—that is, if the mind isn’t only in our brain and body, but is also something residing on an immaterial plane of existence—then, obviously any AGI project couldn’t be realized, not even in principle. It would be a vain chimera.
This I may do in the coming posts, but this time I will limit the argument inside the analytic and naturalistic conventional boundaries. Here are some extracts from my book “Spirit calls Nature” about the nature and future of AI that illustrates the famous Chinese room thought experiment of Searl and that, I believe, is more actual than ever.
To substantiate the claim that meaning is more than something contained in symbols, raw data, and symbolic information processing, the American philosopher John Searle, in 1980, proposed the so-called ‘Chinese room argument.' Searle intended to put forward an argument by a thought experiment that shows that a computer program (running with a classical algorithm, not other forms of computation such as deep learning neural networks or quantum computation, etc.) cannot understand meaning, not even in principle.
The thought experiment imagines Searle himself in a room where he receives questions in Chinese characters through an input slot. Without understanding Chinese but following the instructions of a digital computer program that knows everything about the syntactic rules of the Chinese language by manipulating symbols, Searle sends out the answer in the form of another string of Chinese characters through an output slot.
He simply obeys the instructions of the computer without understanding neither the questions nor the answers. If the computer can pass the so-called ‘Turing test’ – that is, a test which posits that a machine is intelligent like a human if its answers are indistinguishable from those of a human – an external observer, say, a native Chinese speaker, would mistakenly believe that in the room is a Chinese speaker, while in reality, it is the computer that is answering in Chinese.
Searle points out, however, that in principle, he could himself follow the rules of the algorithm step-by-step (say, by consulting the printout of the program library or database which tells him exactly all the possible rules with which he has to manage the Chinese ideograms) and that would allow him to pass the Turing test even without the computer. Due to the fact that from this point of view, there is no difference between the syntactic information processing of the computer or Searle mimicking it without understanding anything of the meaning the string of symbols represents, one must conclude that the computer can’t understand meaning either. It only appears that a symbol-crunching machine understands meaning, but it does not, and the Turing test is not an appropriate tool for determining whether a machine understands meaning and semantics as humans do.
Later, Searle’s thought experiment was further enforced by the so-called “symbol grounding problem,” discussed by Stevan Harnad in 1990. Essentially it points out that it remains a highly problematic issue wherefrom symbols (words, numbers, streams of bits, signals, etc.) get their meaning. There is a hiatus between, on one side, symbols, signals, number crunching, and, on the other side, semantic awareness, intuition, understanding, and knowing, both remaining outside a formal description of any computational model. One can translate the same general problem in the more specific context of neural coding.
Another implication of Searle’s argument is that this also clarifies how a code, say, a binary code such ASCII or the old Morse code has no meaning or function in itself and is not something physical and existent in itself. A code always needs a ‘semantic agent’–that is, a mind that understands the connection between this humanly pre-defined code with letters and symbols and its concatenation into words and sentences and is capable of working with it according to grammatical rules, a dictionary of words, and an association with semantic content. For example, a binary code sets into relation a string of ones and zeros to other symbols. Say the binary string 1010 corresponds to the number 10 in the decimal numeral system. However, it could also signify the 10th letter of the alphabet or whatever other kind of symbol. A binary string doesn’t have or possess any meaning until a mind comes along and makes it meaningful through a standardized and semantic code system. One needs a being giving meaning to symbols. Bits, bytes, letters, or symbols are nothing meaningful without a conscious observer. A string of bits in a computer is nothing other than a series of electric potentials in a microchip. The string 1010 could also be created by aligning two white stones separated by two grey stones in a dusty desert with no meaning whatsoever. As long as a semantic agent does not read out a code, it is just a ‘displacement’ or ‘puncture’ in matter.
In fact, what does the word ‘information’ mean? The etymology of words is often wiser than our understanding of them. The verb ‘in-form’ tells us that something has been ‘formed’ by molding, carving, shaping, or puncturing an object into some pattern or modifying its physical internal, or external state. It is about forming something, making it a medium for symbols that convey a message expressing our thoughts to someone else who can understand those thoughts. However, forming patterns or modifying internal states in objects has no meaning whatsoever if there are no minds to receive and transfer the thoughts expressed. Saying that the ‘fundamental nature of the universe is information,’ as considered by some modern conceptions of the foundations of physics, isn’t very informative either, as it becomes logically circular because it implicitly demands that something be formed which incorporates that information and must, therefore, be prior and more fundamental than the information itself.
Based on these arguments and other implications, Searle also introduced the distinction between ‘strong AI’ (also called ‘Artificial general intelligence’ (AGI)) and ‘weak AI’ . The former is a hypothetical sci-fi AI in which machines truly understand meaning as humans do. The latter indicates machines that only simulate humans' intelligent behavior through smart inferences or probabilistic guesses but have no understanding of what they are doing.
It is doubtful that we will ever have real thinking and conscious machines if they remain utterly unable to have a semantic comprehension of symbols, data, information, images, perceptions, etc. For example, it is questionable as to whether there will ever be a fully autonomous self-driving car if its deep learning neural networks (or whatever kind of AI brain it has) remain unable to grasp the significance of the environment it registers in its memory chips.
In fact, the correctness of the Chinese room argument becomes more apparent as AI progresses. For example, automatic translation has made some progress in the last decades. Computer translation from one language to another is nowadays much more efficient than it was in the 1980s. But, as everyone could sooner or later realize, when a computer (be it algorithmic or based on deep learning software) gets it wrong, this is, in most cases, due to its inherent inability to understand meaning.
At the time of writing, natural language processing is based on a set of rules that assign semantic values to words, so-called ‘word embeddings,’ which is essentially an assignment of numbered values that represent some information about a word’s meaning. For example, it indicates, through numerical representation, the relationship between an animal and its aggressiveness, such as (tiger, 0.99) and (bird, 0.05), meaning that a tiger is more dangerous than a bird. It also works by comparison, such as assigning a similarity index or ‘angle’ to words, like (tiger, lion, 5°) and (tiger, bird, 90°), meaning that there is a less divergent semantic similarity between a tiger and a lion, as both are felines than between a tiger and a bird, which have no semantic relation. Translation algorithms then work by word contextuality–that is, by counting the frequency with which, in the natural language, a word is associated with a neighboring term–and which could tell what a word means. For example, the sentence “I went to the bank, and I read the newspaper” has different meanings if, to the current word ‘bank’ (having two possible meanings), one associates the context word ‘to’ or ‘the’ or ‘and’ or ‘I,’ respectively. Automated translation machines maximize the likelihood that a given current word composes part of a specific semantic context.
In 2018, Google introduced a method nicknamed BERT (Bidirectional Encoder Representations from Transformers) that pre-trains and fine-tunes neural networks by using a sort of Chinese room reference book, figures out which features of a sentence look more relevant, and reads texts bidirectionally–that is, from left to right and also from right to left–to infer the contextuality better. This allowed for a better quality in machine semantic recognition and translation but confirmed that nothing in the machine goes beyond Searle’s approach. That, in the end, is all about guessing, not understanding.
Moreover, notice also how because of a slight verbal ambiguity, a touch of humor, a cynical remark, or irony in a sentence, the translation algorithm will easily fall into an interpretational error, not rarely with hilarious reactions from the human side. The meaning of a message we receive verbally from others also depends on knowing their intentions. It is about empathy on the side of the listener and, eventually, even the ability to interpret body language. Meanwhile, computers only calculate the maximum statistical likelihood of the meaning and context of words; they know nothing about it and the world which creates that context.
This is, of course, an oversimplification of what goes on internally in a chatbot, such as ChatGPT of Open AI (we will take up this later, in Pt.II-III.4) or automated translation software, like that of or Google Translate (which works with representations of up to hundred-dimensional vectors and is trained using billions of sentences on the web to predict meaning and context for any given word). However, it makes amply clear why we have the impression that the most sophisticated translation tools soon reveal their utter semantic inability. Especially if you are reading a translated text into your native language. A computer, endowed with whatever complex and up-to-date AI technology, never understands anything. It only ‘sees’ strings, numbers, and vectors, which have no meaning in themselves unless a human reads them. There is a fundamental difference between translating and interpreting spoken language. There are good reasons why, despite the impressive skills of ChatGPT, Google Translate, or DeepL, and despite all the predictions to the contrary, professional human translators didn’t lose their job.
We won’t delve at length into describing the same issue modern AI has with image recognition. Deep learning neural networks and advanced AI software can indeed correctly recognize objects, faces, or data structures with a high degree of accuracy – and sometimes even better than humans can, especially in finding patterns in big data sets. This is quite an impressive achievement that has and will have interesting applications. However, deep-learning AI is easy to fool, and when it fails, it suddenly mistakes a chair for an elephant, a glass of water for a machine gun, or a mouse for an ocean liner. There is nothing or no ‘ghost’ in the machine that understands anything. Engaging in sophisticated guessing based on a maximal likelihood match is one thing; understanding and associating meaning is an entirely different task.
These AI applications are very good in mimicking human understanding. But, despite the impressive advancements in the last years, I maintain that a semantic awareness in a computer was and still is a no-progress quest.
Searle’s Chinese argument received several critical replies and remains controversial. It shows, however, that algorithmic processing of symbols alone is not sufficient to explain a human’s mind. There is an explanatory gap between syntactic rules and semantics as humans perceive it. The perception of meaning cannot be reduced to information processing alone. Symbols in themselves, like the letters in a book, have the ability to cause meaning to emerge, but only if there is a conscious being capable of understanding meaning in the first place. Meaning is a subjective experiential phenomenon in us, not an intrinsic property of symbols or material objects out of us, and can’t emerge by computation alone.
After undoubtable success stories, especially with the application of deep learning neural networks and ‘large language models’ (LLM), which, to date, seem to be the breakthrough and great promise for building machines mimicking the human mind, I maintain that, sooner or later, we will have to become aware how the original expectations to create a machine that really has a semantic awareness, let alone a conscious experience, were greatly exaggerated, thanks also to an ongoing media overhype.
In the second part, I will outline also a bit of the history of AI to place in proper context the ‘AI-summer’ we are actually going through.
Your comments about perception of meaning not residing in the physical brain are crucial for the critique of meaning. I strongly recommend taking the time to read or at least skim through David Bentley Hart’s ‘The Experience of God: Existence, Consciousness Bliss” (yes, he refers to himself as a Vedantic Christian). He is one of the most learned human beings alive today (among other things, he speaks and reads 16 languages and has passing acquaintance with at least 12 more). Some passages with very long, beautifully poetic sentences remind me remarkably of Sri Aurobindo’s writings.
Hart is eloquent regard to the perception of meaning as one of the core arguments against materialism. And he can be very funny as well (often a bit too caustic, but even there, he is often quite amusing)