Connect with us

Technology

Google Gemini: everything you need to know about the new generative artificial intelligence platform

Published

on

Google is trying to impress with Gemini, its flagship suite of generative AI models, applications and services.

So what are Gemini? How can you use it? And how does it compare to the competition?

To help you sustain with the latest Gemini developments, we have created this handy guide, which we’ll keep updating as new Gemini models, features, and news about Google’s plans for Gemini grow to be available.

What is Gemini?

Gemini is owned by Google long promised, a family of next-generation GenAI models developed by Google’s artificial intelligence labs DeepMind and Google Research. It is available in three flavors:

  • Gemini Ultrathe best Gemini model.
  • Gemini Pro“lite” Gemini model.
  • Gemini Nanoa smaller “distilled” model that works on mobile devices like the Pixel 8 Pro.

All Gemini models were trained to be “natively multimodal” – in other words, able to work with and use greater than just words. They were pre-trained and tuned based on various audio files, images and videos, a big set of codebases and text in various languages.

This distinguishes Gemini from models akin to Google’s LaMDA, which was trained solely on text data. LaMDA cannot understand or generate anything beyond text (e.g. essays, email drafts), but this isn’t the case with Gemini models.

What is the difference between Gemini Apps and Gemini Models?

Image credits: Google

Google, proving once more that it has no talent for branding, didn’t make it clear from the starting that Gemini was separate and distinct from the Gemini web and mobile app (formerly Bard). Gemini Apps is solely an interface through which you can access certain Gemini models – consider it like Google’s GenAI client.

Incidentally, Gemini applications and models are also completely independent of Imagen 2, Google’s text-to-image model available in a few of the company’s development tools and environments.

What can Gemini do?

Because Gemini models are multimodal, they will theoretically perform a spread of multimodal tasks, from transcribing speech to adding captions to images and videos to creating graphics. Some of those features have already reached the product stage (more on that later), and Google guarantees that each one of them – and more – can be available in the near future.

Of course, it is a bit difficult to take the company’s word for it.

Google seriously fell in need of expectations when it got here to the original Bard launch. Recently, it caused a stir by publishing a video purporting to show the capabilities of Gemini, which turned out to be highly fabricated and kind of aspirational.

Still, assuming Google is kind of honest in its claims, here’s what the various tiers of Gemini will give you the option to do once they reach their full potential:

Gemini Ultra

Google claims that Gemini Ultra – thanks to its multimodality – may help with physics homework, solve step-by-step problems in a worksheet and indicate possible errors in already accomplished answers.

Gemini Ultra can be used for tasks akin to identifying scientific articles relevant to a selected problem, Google says, extracting information from those articles and “updating” a graph from one by generating the formulas needed to recreate the graph with newer data.

Gemini Ultra technically supports image generation as mentioned earlier. However, this feature has not yet been implemented in the finished model – perhaps because the mechanism is more complex than the way applications akin to ChatGPT generate images. Instead of passing hints to a picture generator (akin to DALL-E 3 for ChatGPT), Gemini generates images “natively” with no intermediate step.

Gemini Ultra is accessible as an API through Vertex AI, Google’s fully managed platform for AI developers, and AI Studio, Google’s online tool for application and platform developers. It also supports Gemini apps – but not totally free. Access to Gemini Ultra through what Google calls Gemini Advanced requires a subscription to the Google One AI premium plan, which is priced at $20 monthly.

The AI ​​Premium plan also connects Gemini to your broader Google Workspace account—think emails in Gmail, documents in Docs, presentations in Sheets, and Google Meet recordings. This is useful, for instance, when Gemini is summarizing emails or taking notes during a video call.

Gemini Pro

Google claims that Gemini Pro is an improvement over LaMDA by way of inference, planning and understanding capabilities.

Independent test by Carnegie Mellon and BerriAI researchers found that the initial version of Gemini Pro was actually higher than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. However, the study also found that, like all major language models, this version of Gemini Pro particularly struggled with math problems involving several digits, and users found examples of faulty reasoning and obvious errors.

However, Google promised countermeasures – and the first one got here in the type of Gemini 1.5 Pro.

Designed as a drop-in substitute, Gemini 1.5 Pro has been improved in lots of areas compared to its predecessor, perhaps most notably in the amount of information it could actually process. Gemini 1.5 Pro can write ~700,000 words or ~30,000 lines of code – 35 times greater than Gemini 1.0 Pro. Moreover – the model is multimodal – it isn’t limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in various languages, albeit at a slow pace (e.g., looking for a scene in an hour-long movie takes 30 seconds to a minute).

Gemini 1.5 Pro entered public preview on Vertex AI in April.

An additional endpoint, Gemini Pro Vision, can process text images – including photos and videos – and display text according to the GPT-4 model with Vision OpenAI.

Twins

Using Gemini Pro with Vertex AI. Image credits: Twins

Within Vertex AI, developers can tailor Gemini Pro to specific contexts and use cases through a tuning or “grounding” process. Gemini Pro can be connected to external third-party APIs to perform specific actions.

AI Studio includes workflows for creating structured chat prompts using Gemini Pro. Developers have access to each Gemini Pro and Gemini Pro Vision endpoints and might adjust model temperature to control creative scope and supply examples with tone and elegance instructions, in addition to fine-tune security settings.

Gemini Nano

The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is powerful enough to run directly on (some) phones, slightly than sending the job to a server somewhere. So far, it supports several features on the Pixel 8 Pro, Pixel 8, and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which allows users to record and transcribe audio with the touch of a button, provides a Gemini-powered summary of recorded conversations, interviews, presentations and more. Users receive these summaries even in the event that they do not have a signal or Wi-Fi connection available – and in a nod to privacy, no data leaves their phone.

Gemini Nano can be available on Gboard, Google’s keyboard app. There, it supports a feature called Smart Reply that helps you suggest the next thing you’ll want to say while chatting in the messaging app. The feature initially only works with WhatsApp, but can be available in additional apps over time, Google says.

In the Google News app on supported devices, the Nano enables Magic Compose, which allows you to compose messages in styles akin to “excited”, “formal”, and “lyrical”.

Is Gemini higher than OpenAI’s GPT-4?

Google has had this occur a number of times advertised Gemini’s benchmarking superiority, claiming that Gemini Ultra outperforms current state-of-the-art results on “30 of 32 commonly used academic benchmarks used in the research and development of large language models.” Meanwhile, the company claims that Gemini 1.5 Pro is best able to perform tasks akin to summarizing content, brainstorming, and writing higher than Gemini Ultra in some situations; it will probably change with the premiere of the next Ultra model.

However, leaving aside the query of whether the benchmarks actually indicate a greater model, the results that Google indicates appear to be only barely higher than the corresponding OpenAI models. And – as mentioned earlier – some initial impressions weren’t great, each amongst users and others scientists mentioning that the older version of Gemini Pro tends to misinterpret basic facts, has translation issues, and provides poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in Gemini apps and, for now, in AI Studio and Vertex AI.

However, when Gemini 1.5 Pro leaves the preview in Vertex, the model will cost $0.0025 per character, while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (roughly 140 to 250 words) and, for models like the Gemini Pro Vision, per image ($0.0025).

Let’s assume a 500-word article incorporates 2,000 characters. To summarize this text with the Gemini 1.5 Pro will cost $5. Meanwhile, generating an article of comparable length will cost $0.1.

Pricing for the Ultra has not yet been announced.

Where can you try Gemini?

Gemini Pro

The easiest place to use Gemini Pro is in the Gemini apps. Pro and Ultra respond to queries in multiple languages.

Gemini Pro and Ultra are also available in preview on Vertex AI via API. The API is currently free to use “within limits” and supports certain regions including Europe, in addition to features akin to chat and filtering.

Elsewhere, Gemini Pro and Ultra might be present in AI Studio. Using this service, developers can iterate on Gemini-based prompts and chatbots after which obtain API keys to use them of their applications or export the code to a more complete IDE.

Code Assistant (formerly AI duo for programmers), Google’s suite of AI-based code completion and generation tools uses Gemini models. Developers could make “large-scale” changes to code bases, akin to updating file dependencies and reviewing large snippets of code.

Google has introduced Gemini models in its development tools for the Chrome and Firebase mobile development platform and database creation and management tools. It has also introduced new security products based on Gemini technology, e.g Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that may analyze large chunks of probably malicious code and enable users to search in natural language for persistent threats or indicators of compromise.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Technology

Raspberry Pi releases the Pico 2W, a $7 wireless-capable microcontroller board

Published

on

By

Get to know Raspberry Pi Pico 2Wa tiny board designed around a microcontroller that permits you to construct large-scale hardware projects. Raspberry Pi once more uses its own, RP2350 well documented microcontroller.

But what’s a microcontroller again? As the name suggests, microcontrollers will let you control other components or electronic devices. Regular Raspberry Pis are general-purpose single-board computers, while microcontrollers are specifically designed to interact with other components.

Microcontrollers are often low-cost, small and really energy efficient. As you may see in the image above, the Pico 2W has dozens of input and output pins (small yellow holes around the board) on its sides that it uses to speak with other components.

Hobbyists normally start creating a microcontroller-based project with a file bread cutting board to avoid soldering. Later they will solder the microcontroller to other parts.

Unlike traditional Raspberry Pi computers, microcontrollers don’t run a full-fledged operating system. Your code runs directly on the chip.

In addition to C and C++, Pico 2 W supports MicroPython, a Python-inspired language for microcontrollers, for programming purposes. The latest board maintains hardware and software compatibility with previous generation boards.

The latest $7 Pico 2W processor features a dual-core, dual-architecture processor running at 150MHz. When developing a microcontroller, you may make a choice from a pair of Arm Cortex-M33 cores and a pair of open-hardware Hazard 3 RISC-V cores.

Arm Cortex-M33 cores are widely utilized in the microcontroller world, but some may prefer RISC-V cores. Everything could be configured in software, so that you do not have to decide on one microcontroller over one other when ordering latest boards.

The Pico 2W has 4MB of onboard flash memory for code storage, while the RP2350 has 520KB of onboard SRAM. I’ll say it again: this just isn’t a computer beast. It’s a microcontroller!

In terms of wireless capabilities, Pico 2W supports Wi-Fi (2.4 GHz 802.11n) and Bluetooth 5.2. It could be nice to get 5 GHz support for versatility, but possibly we are able to achieve that in the next version.

If you do not need wireless features for price or compliance reasons, Raspberry Pi also offers Pico 2 without this feature for $5.

Raspberry Pi products are increasingly utilized by firms involved in industrial and electronics production. When Raspberry Pi became a public company this yr, it reported that the industrial and embedded segment accounted for 72% of its sales.

This might be why you may buy single pieces of Pico 2 boards in addition to spools of 480 pieces. This is what the Pico 2 microcontroller board spool looks like:

Image credits:Raspberry Pi /

This article was originally published on : techcrunch.com
Continue Reading

Technology

Entrepreneur Marc Lore on ‘founder mode’, bad hiring and why avoiding risk is deadly

Published

on

By

Entrepreneur Marc Lore has already sold a complete of two corporations for billions of dollars. Now he plans to start out delivering takeaway food Wonder made public in a couple of years, at an ambitious valuation of $40 billion.

We recently spoke in person with Lore in New York about Wonder and its ultimate goal of constructing meal planning easier, but we also touched on Lore’s management philosophy. Below is a part of what he said on the topic, flippantly edited for length and clarity.

Lore on the so-called founder modewhere founders and CEOs actively engage not only with their direct reports, but in addition with “skip level” employees to make sure that small challenges don’t grow to be big ones (Brian Chesky works this fashion, as does Nvidia’s Jensen Huang, Elon Musk and Sam Altman, amongst others):

Yes, I didn’t just like the founding mode because I operate in a different way. I focus very much on the concepts of vision, capital and people. We meet weekly with the leadership team and spend two hours every week on the core elements of vision, strategy, organizational structure, capital plan, our performance management systems, compensation systems, behaviors and values ​​- akin to: things you’re thinking that are already set.

You think, “Oh, yeah, we’ve done certain behaviors before. We have already established the values. We dealt with performance management. We have our strategy.” But as you grow and develop quickly, it’s amazing how much it evolves over time, and you must sustain with it… and just speak about it and speak about it.

When everyone is fully aligned and you have got really good people, you simply allow them to do it; I do not have to get entangled in any respect. So I won’t go into the small print of what people do, so long as they know the nuances of the strategy and vision. When you connect that together with your team and they achieve that with their very own team, everyone is moving in the correct direction.

What Lore thinks about hiring the correct people:

I actually, really care about hiring rock stars. That is, one and all (I hire). I used to think you could possibly interview someone and inside an hour resolve whether or not they were a rock star. I actually thought so, and I believe other people think so too.

It’s not possible. I’ve employed hundreds of individuals. You cannot tell in an hour-long interview whether someone is a rock star, and it’s normal to get honeyed. Someone talks about a great game, sounds good, says the correct things, has the correct experience, and then it doesn’t work out and you wonder why.

I began going back to resumes and attempting to draw correlations, and I discovered that there was a definite pattern that superstar resumes had that distinguished them from non-superstar resumes. This doesn’t suggest that somebody who doesn’t have a superstar resume cannot be a superstar. I miss these people, it’s okay. But after I see someone with a superstar resume, they’re almost all the time a superstar. When I interview them, I already know that I would like to rent them, and it’s more about ensuring that I’m not missing anything from a behavioral, cultural, or values ​​standpoint – we would like it to be compatible.

However, your resume must show a demonstrable level of success in each position you have got worked in. This means multiple promotions. This means staying with the corporate long enough to advance, and leaving and moving from one company to a different is a giant step. Superstars don’t move sideways. They don’t move from a great company to a bad one because bad corporations must pay more to draw people, so sometimes they shake loose individuals who should not that good, who just need to go for the cash.

But you discover someone who’s (at the highest) 5% and you take a look at their CV and it’s like: boom, boom, promotion, promotion, promotion, promotion, promotion, promotion, and then a giant jump… promotion, promotion, big jump . When I get a resume that shows a visual level of success, I take it and pay them what they need. It’s very essential for me to get this superstar there. And you are constructing an organization of superstars.

You have to have a correct performance management system in place in order that they know exactly what they should do to get to the following level. Because superstars are very motivated. They need to know what they should do to get to the following level, especially Generation Z. They need to know and get promoted every six months.

Finally, Lore talks about his belief that taking more risks is the solution to secure a startup’s future, even when this approach could seem counterintuitive to many:

People all the time underestimate the risk of the establishment and overestimate the risk of introducing change. I see it over and all over again.

If you have got a life-threatening disease and the doctor says, “You have six months to live,” at that time you may go on a trial drug or anything, even when it’s extremely dangerous (it should look good). Basically, you are trying to take a risk to avoid inevitable death.

If you are super healthy and every thing’s going great and someone says, “Take this experimental drug; it can make you live longer” (many individuals will say), “You know what? It’s too dangerous. I’m really healthy. I don’t desire to die from this drug.”

However, startups are very different from large corporations. When you’re employed at a big company like Walmart (whose US e-commerce business Lore tracked selling is certainly one of his corporations), it’s about incremental improvement. There is no incentive to take risks.

As a startup founder, you’ll likely die. Every day that you just live and do that startup, there is a risk that you’re going to die. The probability is 80% and only a 20% likelihood it should actually work. So you have got to take this into consideration when making decisions. You must search for opportunities to take risks to cut back your risk of death. The establishment is the worst thing you may do. Doing nothing is the most important risk you may take.

This article was originally published on : techcrunch.com
Continue Reading

Technology

Australian government withdraws disinformation law

Published

on

By

The Australian government has withdrawn a bill that might have imposed penalties on online platforms as much as 5 percent their global income in the event that they fail to stop the spread of disinformation.

The bill, backed by the Labor government, would enable the Australian Communications and Media Authority to create enforceable rules on disinformation on digital platforms.

IN statementCommunications Minister Michelle Rowland said the bill would “provide an unprecedented level of transparency, holding big tech accountable for its systems and processes to prevent and prevent the spread of harmful misinformation and disinformation online.”

However, she said that “based on public statements and conversations with senators, it is clear that there is no way this proposal could be passed through the Senate.”

When a revised version of the bill was introduced in September, Elon Musk, the owner of X (formerly Twitter), criticized it in a one-word post: “Fascists.”

Shadow communications minister David Coleman was a vocal opponent of the bill, arguing it could encourage platforms to suppress free speech to avoid penalties. Because the bill seems dead now, Coleman sent that it was a “shocking attack on free speech that betrayed our democracy” and called on the Prime Minister to “rule out any future version of this legislation”.

Meanwhile, Rowland in his statement called on Parliament to support “other proposals to strengthen democratic institutions and keep Australians safe online”, including laws to combat deepfakes, enforcement of “truth in political advertising during elections” and regulation of artificial intelligence .

Prime Minister Anthony Albanese can be moving forward with a plan to ban children under 16 from using social media.

This article was originally published on : techcrunch.com
Continue Reading
Advertisement

OUR NEWSLETTER

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Trending