ChatGPT has learned to communicate. On Monday, OpenAI, a San Francisco-based artificial intelligence startup, introduced a version of its famous chatbot that can connect with people using spoken language. Users can converse with ChatGPT in the same way that they do with Amazon’s Alexa, Apple’s Siri, and other digital assistants. ChatGPT can now respond to photos for the first time. People can, for example, upload a snapshot of the interior of their refrigerator, and the chatbot will recommend cuisines based on the components they have.
ChatGPT garnered hundreds of millions of users after its launch in November, and numerous more startups quickly followed suit
“We’re looking to make ChatGPT easier to use — and more helpful,” said Peter Deng, OpenAI’s vice president of consumer and enterprise products. In recent weeks, OpenAI has hastened the delivery of its AI tools. This month, it released a new version of its DALL-E picture generator and integrated it into ChatGPT. ChatGPT garnered hundreds of millions of users after its launch in November, and numerous more startups quickly followed suit. OpenAI’s new bot version goes beyond rival chatbots like Google Bard while also competing with older technology like Alexa and Siri. Alexa and Siri have long allowed spoken-word interactions with smartphones, laptops, and other devices. Chatbots, on the other hand, have more powerful language abilities and can instantaneously produce letters, poetry, and term papers, as well as a riff on practically any topic thrown their way.
OpenAI has essentially integrated the two modes of communication. Talking, according to the company, is a more natural manner of communicating with its chatbot. It claims that ChatGPT’s synthetic voices — from which users may select five distinct alternatives, including male and female voices — are more believable than those used with popular digital assistants. The new version of the chatbot will be available to everyone who subscribes to ChatGPT Plus, a $20-per-month subscription, over the following two weeks, according to the business. However, the bot can only answer with voice when used on iPhones, iPads, and Android smartphones.
The bot’s synthetic voices are more natural than many others on the market, though they still can sound robotic. Like other digital assistants, it can struggle with homonyms. When The New York Times asked the new ChatGPT how to spell “gym,” it said: “J-I-M.” But one of the advantages of a chatbot like ChatGPT is that it can correct itself. When told “No, the other kind of gym,” the bot replied: “Ah, I see what you’re referring to now. The place where people exercise and work out is spelt G-Y-M.”
Older digital assistants, such as Alexa and Siri, functioned as command and control centers, capable of doing a limited number of activities
Though the speech interface of ChatGPT is similar to that of previous assistants, the underlying technology is substantially different. ChatGPT is essentially powered by a large language model, or L.L.M., which has learned to generate language on the fly by studying massive quantities of text from the internet. Older digital assistants, such as Alexa and Siri, functioned as command and control centers, capable of doing a limited number of activities or responding to a limited number of questions loaded into their databases, such as “Alexa, turn on the lights” or “What’s the weather in Cupertino?” It could take weeks to add new commands to the older assistants. ChatGPT can react authoritatively in seconds to almost any query thrown at it — though it is not always correct.
Companies like Amazon and Apple are changing their digital assistants into something more like ChatGPT as OpenAI transforms ChatGPT into something more like Alexa or Siri. Amazon previewed an improved Alexa system last week that aimed for more fluid communication about “any topic.” According to the corporation, it is driven in part by a new L.L.M. and includes other changes to tempo and intonation to sound more natural. According to two people briefed on the initiative, Apple, which has not publicly discussed its plans for competing with ChatGPT, has been testing a prototype of its vast language model for future devices.
Over the summer, Microsoft released a visual search capability based on OpenAI technology
The new ChatGPT can reply to photos when used on the web, as well as on iPhone, iPad, and Android smartphones. It can provide a full description of an image and answer questions about its contents when given a photograph, chart, or diagram. This could be a great tool for visually challenged persons. The image tool was first exhibited by OpenAI in the spring, but the company stated that it would not be released to the public until academics better understood how the technology could be abused. They were concerned, among other things, that the technology would become a de facto facial recognition service used to swiftly identify persons in images.
Over the summer, Microsoft released a visual search capability based on OpenAI technology in its Bing chatbot. According to Sandhini Agarwal, an OpenAI researcher focusing on safety and policy, the latest version of the bot would now refuse attempts to detect faces. However, it is intended to provide extremely thorough descriptions of other photos. It can answer to an image from the Hubble Space Telescope, for example, with paragraphs explaining the contents of the image. The bot can also be used by students. Given an image of a high school math issue including words, numbers, and diagrams, the bot can rapidly read and answer the problem. It could be a good method to learn — or to cheat.