Artificial intelligence like a child. He learns human behavior from scratch

A big event in the technology industry.  Artificial intelligence in the Polish version

Researchers at New York University have developed a machine learning model that mimics the way children learn language. Using video and audio recordings recorded from the perspective of a young child, the model successfully learned to match words to corresponding images

High effectiveness, twice as high as in the case of much larger models, suggests that we are close to understanding how children begin to understand and use language. So far, we have been moving in the area of ​​unconfirmed, but only probable, theories.

There are many different theories about how children acquire language. On the one hand, we have the theory of natural language acquisition – many researchers claim that children are born with special elements or knowledge built into the brain that uniquely allow us to learn language. On the other hand, we have a theory based on nurture, according to which children learn language mainly on the basis of sensory experiences. This is not about innate abilities, but about the experiences we gain every day that allow us to learn, for example, English – explains Wai Keen Vong from the Data Analytics Center of New York University.

In their work, scientists based their work mainly on the second of these theories. While developing the model, they also wanted to gain a deeper understanding of the process of early language acquisition.

The data base for this work was collected using a head-mounted camera worn by one child. She recorded video and audio recordings while accompanying the child between 6 and 25 months of age. The data set included 600,000. video frames combined and 37.5 thousand. transcribed statements. It was crucial that the study did not take place in laboratory conditions, but as close to the natural environment as possible.

This is the largest model of its kind ever created. There are already datasets containing recordings of people talking to their children, but none of them covered such a long period of time and did not include video footage of what the child was seeing at any given moment. Our data provide a unique perspective on language acquisition. Furthermore, we combined this data with a multimodal neural network. The model is quite general in terms of how learning occurs. Data is transferred and simple updates to the rules on which learning occurs. The fact that learning occurs provides a new perspective on early language acquisition that is different from previous approaches. There is a much greater emphasis on learning and what can be achieved through it – emphasizes Wai Keen Vong.

The model's performance was assessed based on categorization by assigning words to their corresponding visualizations. The classification accuracy was 61.6%. For comparison, tests were carried out on the CLIP image and text contrast neural network, trained on a much larger data pool. In this case, the classification accuracy was only 34.7%.

This is the first time we have confirmed that our model can learn these words from children's real-life experiences. This learning opportunity supports the idea that this may be one way for children to acquire these types of concepts. We cannot say this categorically, but it is a very suggestive and interesting direction for further research – announces the researcher.

According to Precedence Research, the global machine learning market will exceed $38 billion in 2022. revenues. By 2032, its value will grow to over USD 771 billion.

Similar Posts