Google LLC is charging ahead with its enterprise-focused AI model creations, today unveiling two major updates. Low-latency: Gemini 1.5 Flash in public preview High-throughput, high-accuracy: The enhanced functionality of the already generally available model Gemini 1.5 Pro is now battle-tested, complete with its 2-million-token input window (up from a maximum sequence length potential of just under two paragraphs).
Moreover, the company also announced its newest high-fidelity text-to-image model: Imagen 3 (preview). This one brings some quite quality improvements on before, its precursor the Imagen 2.
Gemini 1.5 Flash
Gemini 1.5 Flash went into public preview last month, and is now being released as a generally available service offering from Basis Technology referencing them over the past few months on their blog site. It has competitive prices, 1 million tokens context window size and the process up to high speed as BERT. It means it has input 60 times larger than OpenAI's GPT-3. It's 40 per cent faster than the mean time of a Renault R5 Turbo
What is interesting about Gemini 1.5 Flash besides its low input token price which gives it an competitive advantage, thanks to its speed of operation is even more dangerous than the previous version introduced by us on Friday! Google says customers such as Uber have taken advantage of Gemini 1.5 Flash for its UberEats service, reducing response times by nearly a half to provide better customer experience -- even through dinner rush or bad weather events.
Gemini 1.5 Pro with 2M Token Input Window
The 2 million-token input window in Gemini 1.5 Pro Enterprise customers can easily process thousands of documents and very long videos with the new LegionNLP engine, thanks to ELMo / BERT models which greatly broaden our information content handling capacity! 1.5 Pro: Can efficiently handle 2 hours of video, upto 22 hours audio, over >60k lines of code or ~1.5 million words - with this context size
Thomas Kurian, CEO of Google Cloud said this was a key capability that other companies had found useful. For example, Retailers could leverage in-store cameras with wide context windows to increase efficiency of customer flow during peak hours while Financial institutions might want to make all 10-Ks and 10-Qs that are produced every earnings day as a single dataset..
The larger context window eliminates the need to break up large documents or videos into smaller chunks for processing, saving time and effort.
Imagen 3 Upgraded with Better Quality
Google last year launched Imagen 3 as a new solution for creating images, and is now making the same model available in Google Cloud's managed delivery platform Vertex AI hub. It creates photo-realistic images from few lines of natural language requests in more than 40% faster image generation, improves prompt comprehension and instruction following compared to Imagen 2.
Image 3 provides much required text alignment control while preparing images. While text production in diffusion-style models is difficult, Imagen 3 looks to mitigate some of these errors and misunderstandings.
FontsAI research lead Gaurav Sharma on the initial results of Imagen 3: "We have an imagination and want to test it in marketing environment. Our imagination resulted quite well (good quality, fast processing), generation of images now is way more detailed and real-looking for people population."
There are different languages and orders available out of the box in this new model which makes it personal to everyone that will use this.
Advanced Grounding Capabilities
Google had announced grounding with Google Search in Vertex AI during the 2021 Google I/O developer conference back last May. This way, the output of Gemini will be enriched with fresh search results from Google Search. As of next quarter, Vertex AI will also introduce a supply chain for source compliant data from trusted third parties to support ground truthing in the enterprise.
Google partners with reputable information sources like Moody’s, Thomson Reuters and ZoomInfo to serve curated data without the fake kind. Google Version: Google provides a grounding mode that will only use content provided by the customer for certain high-fidelity use cases, like financial services and healthcare to ensure all responses are based on your data.
This will reduce the chances of errors and ensures high levels of factual accuracy. The model also provides each response with a confidence score and source so that users can better understand where the information is coming from.
In summary, Google’s latest AI advancements with Gemini 1.5 and Imagen 3 are set to significantly enhance enterprise capabilities, offering faster processing, better quality outputs, and robust grounding for reliable information.