If you can ignore Vertex most of the complaints here are solved - the non-Vertex APIs have easy to use API keys, a great debugging tool (https://aistudio.google.com), a well documented HTTP API and good client libraries too.
I actually use their HTTP API directly (with the ijson streaming JSON parser for Python) and the code is reasonably straight-forward: https://github.com/simonw/llm-gemini/blob/61a97766ff0873936a...
You have to be very careful when searching (using Google, haha) that you don't accidentally end up in the Vertext documentation though.
Worth noting that Gemini does now have an OpenAI-compatible API endpoint which makes it very easy to switch apps that use an OpenAI client library over to backing against Gemini instead: https://ai.google.dev/gemini-api/docs/openai
Anthropic have the same feature now as well: https://docs.anthropic.com/en/api/openai-sdk
What's the point of working at $ENTERPRISE_BIGCO if you don't fight with IT & Legal & various annoying middle managers.
Anyway let's table this for now and circle back later after we take care of some of the low hanging fruit. Keep me in the loop and I will do a deep dive into how we can think outside the box and turn this into a win-win. I will touch base with you when I have all my ducks in a row and we can hop on a call.
Google sounds like a fun place to work, run it up the flagpole and see if you can move the needle before the next hard stop for me.
For external service I have to get a unique card for billing and then upload monthly receipts, or ask our ops to get it setup and then wait for weeks as the sales/legal/compliance teams on each side talk to each other.
creds = service_account.Credentials.from_service_account_file(
SA_FILE,
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/generative-language",
]
)
google.genai.Client(
vertexai=True,
project=PROJECT_ID,
location=LOCATION,
http_options={"api_version": "v1beta1"},
credentials=sa_creds,
)
That `vertexai=True` does the trick - you can use same code without this option, and you will not be using "Vertex".Also, note, with Vertex, I am providing service account rather than API key, which should improve security and performance.
For me, the main aspect of "using Vertex", as in this example is the fact Start AI Cloud Credit ($350K) are only useable under Vertex. That is, one must use this platform to benefit from this generous credit.
Feels like the "Anthos" days for me, when Google now pushing their Enterprise Grade ML Ops platform, but all in all I am grateful for their generosity and the great Gemini model.
As a replacement for SA files one can have e.g. user accounts using SA impersonation, external identity providers, or run on GCP VM or GKE and use built-in identities.
(ref: https://cloud.google.com/iam/docs/migrate-from-service-accou...)
I still don't understand the distinction between Gemini and Vertex AI apis. It's like Logan K heard the criticisms about the API and helped push to split Gemini from the broader Google API ecosystem but it's only created more confusion, for me at least.
Vertex AI is for grpc, service auth, and region control (amongst other things). Ensuring data remains in a specific region, allowing you to auth with the instance service account, and slightly better latency and ttft
For deploying, on GitHub I just use a special service account for CI/CD and put the json payload in an environment secret like an API key. The only extra thing is that you need to copy it to the filesystem for some things to work, usually a file named google_application_credentials.json
If you use cloud build you shouldn't need to do anything
Everything is service accounts and workload identity federation, with restrictions such as only letting main branch in specific repo to use it (so no problem with unreviewed PRs getting production access).
Edit: if you have a specific error or issue where this doesn't work for you, and can share the code, I can have a look.
How do you sign a firebase custom auth token with workload identity federation? How about a pre signed storage URL? Off the top of my head I think those were two things that don't work
You have a JSON key file which you can't know how many people have. The person who created the key, downloaded it and then stored it as github secret - did they download it to /dev/shm? Did some npm/brew install script steal it from their downloads folder? Any of the github repo owners can get hold of it. Depending on whether you use github environments/deployments and have set it up properly, so can anyone with write access to the repo. Do you pin all your dependencies, reusable workflows etc, or can a compromise of someone elses repo steal your secrets?
With the workload identity auth, there is no key. Each access obtains a short lived token. Only workflows on main branch can get it. Every run will have audit logs, and so will every action taken by that token. Risk of compromise is much lower, but even more importantly, if compromised I'll be able to know exactly when and how, and what malicious actions were taken.
Maybe this is paranoid to you and not worth it. That's fine. But it's not "no risk", and it is worth to me to protect personal data of our users.
---
As for your question, first step is just to run https://github.com/google-github-actions/auth with identity provider configured in your GCP project, restricted to your github repo or org.
This will create application default credentials that most GCP tools and libraries will just work with as if when you are running things locally after "gcloud auth login".
For firebase token you can just run a python script as subsequent step in the github job doing something like https://firebase.google.com/docs/auth/admin/create-custom-to.... For signed storage url this can be done with the gcloud tool: https://cloud.google.com/storage/docs/access-control/signing...
In both cases after running the "google-github-actions/auth" step it will just work with the short-lived credentials that step generated.
And even if you don't ask, there are many examples. But I feel ya. The right example to fit your need is hard to find.
- There are principals. (users, service accounts)
- Each one needs to authenticate, in some way. There are options here. SAML or OIDC or Google Signin for users; other options for service accounts.
- Permissions guard the things you can do in Google cloud.
- There are builtin roles that wrap up sets of permissions.
- you can create your own custom roles.
- attach roles to principals to give them parcels of permissions.
It's not complicated in the context of huge enterprise applications, but for most people trying to use Google's LLMs, it's much more confusing than using an API key. The parent commenter is probably using an aws secret key.
And FWIW this is basically what google encourages you to do with firebase (with the admin service account credential as a secret key).
Java/JS is in preview (not ready for production) and will be GA soon!
as there are so many variations out there the AI gets majorly confused, as a matter of fact, the google oauth part is the one thing that gemini 2.5 pro cant code
should be its own benchmark
Happy to provide test cases as well if helpful.
0: https://datatracker.ietf.org/doc/html/draft-fge-json-schema-...
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
And you are watching us evolve overtime to do better.
Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)
No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"
Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.
Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.
One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.
EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.
(While you can certainly try to use CloudWatch, it’s not exact. Your other options are “Wait for the bill” or log all Bedrock invocations to CloudWatch/S3 and aggregate there)
FWIW OpenAI compatibility only gets you so far with Gemini. Gemini’s video/audio capabilities and context caching are unparalleled and you’ll likely need to use their SDKs instead to fully take advantage of them.
- Vertex AI
- AI Studio
- Gemini
- Firebase Gen AI
Just stick with AI Studio and the free developer AI along with it; you will be much much happier.
Do Google use all the AI studio traffic to train etc?
The new gemini models themselves though, are killer. The confusion is a small price to pay.
First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex
We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.
I’m going to respond to some of the comments given here directly on the thread
Regardless of if I passed a role or not, the function would say something to the effect of "invalid role, accepted are user and model".
Tried switching to openAI compatible SDK, it threw errors for tool call calls and I just gave up.
Could you confirm if it was a known bug that was fixed?
(which I think is what you are using but maybe i'm wrong).
Feel free to DM me on @chrischo_pm on X. Stuff that you are describing shouldn't happen
At ~10:15AM UTC 04 May, a change was rolled out to the Vertex API (but not the Gemini API) that caused the API to respect the `include_thoughts` setting and return the thoughts. For consumers that don't handle the thoughts correctly and had specified `include_thoughts = true`, the thinking traces then leaked into responses.
[1]: https://googleapis.github.io/python-genai/genai.html#genai.t...
[2]: https://ai.google.dev/api/generate-content#ThinkingConfig
[3]: https://github.com/googleapis/python-genai/blob/157b16b8df40...
Or not failing when passing `additionalProperties: false`
Or..
For other models, see this link and open up the collapsed section for your specific model: https://ai.google.dev/gemini-api/docs/models
I hope it doesn't become a trend on this site.
It is incredibily lame for a gargantuan company like Google and their thousands of developers and PMs and this and that ... to come to a remote corner of the web to pretend they are doing what they should have done 10 years ago.
"A Product Manager (PM) at Google is responsible for guiding the development of products from conception to launch. They identify user needs, define product vision and strategy, prioritize features, work with cross-functional teams (engineering, design, marketing), and ensure the product aligns with business goals. They act as the bridge between technical teams and stakeholders to deliver successful, user-focused solutions."
Some might have ignored your question, but in the spirit of good conversation, I figured I’d share a quick explanation of what a PM does, just in case it helps!
Sometimes we get it right the first time we launch it, I think most of the time we get it right over a period of time.
Trying to do a little bit better everyday and ship as fast as possible!
It's the best model out there.
That would all still be OK-ish except that their JS library only accepts a local path, which it then attempts to read using the Node `fs` API. Serverless? Better figure out how to shim `fs`!
It would be trivial to accept standard JS buffers. But it’s not clear that anyone at Google cares enough about this crappy API to fix it.
You can? Google limits HTTP requests to 20MB, but both the Gemini API and Vertex AI API support embedded base64-encoded files and public URLs. The Gemini API supports attaching files that are uploaded to their Files API, and the Vertex AI API supports files uploaded to Google Cloud Storage.
Here's the code: https://github.com/simonw/tools/blob/main/gemini-mask.html
https://github.com/ryao/gemini-chat
The main thing I do not like is that token counting is rated limited. My local offline copies have stripped out the token counting since I found that the service becomes unusable if you get anywhere near the token limits, so there is no point in trimming the history to make it fit. Another thing I found is that I prefer to use the REST API directly rather than their Python wrapper.
Also, that comment about 500 errors is obsolete. I will fix it when I do new pushes.
Example for 1.5:
https://github.com/googleapis/python-aiplatform/blob/main/ve...
We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.
Read timeouts aren't configurable on the Vertex SDK.
In 2012, Google was far ahead of the world in making the vast majority of their offerings intensely API-first, intensely API accessible.
It all changed in such a tectonic shift. The Google Plus/Google+ era was this weird new reality where everything Google did had to feed into this social network. But there was nearly no API available to anyone else (short of some very simple posting APIs), where Google flipped a bit, where the whole company stopped caring about the rest of the world and APIs and grew intensely focused on internal use, on themselves, looked only within.
I don't know enough about the LLM situation to comment, but Google squandering such a huge lead, so clearly stopping caring about the world & intertwingularity, becoming so intensely internally focused was such a clear clear clear fall. There's the Google Graveyard of products, but the loss in my mind is more clearly that Google gave up on APIs long ago, and has never performed any clear acts of repentance for such a grevious mis-step against the open world, open possibilities, against closed & internal focus.
Google's API's have a way steeper learning curve than is necessary. So many of their APIs depend on complex client libraries or technologies like GRPC that aren't used much outside of Google.
Their permission model is diabolically complex to figure out too - same vibes as AWS, Google even used the same IAM acronym.
I don't see that dependency. With ANY of the APIs. They're all documented. I invoke them directly from within emacs . OR you can curl them. I almost never use the wrapper libraries.
I agree with your point that the client libraries are large and complicated, for my tastes. But there's no inherent dependency of the API on the library. The dependency arrow points the other direction. The libraries are optional; and in my experience, you can find 3p libraries that are thinner and more targeted if you like.
Apparently now you need to use google-cloud-quotas to get the limit and google-cloud-monitoring to get the usage.
VS Code copilot managed to implement the first part, getting the limit using gemini-2.5-pro, but when I asked gemini to implement the second part it said that integrating cloud-monitoring is too complex and it can't do it !!!!
Google's stock performance, revenue growth, and political influence in Washington under his leadership has grown substantially. I don't disagree that there are even better CEO's out there, but as an investor, the framing of your question is way off. Given the financial performance, why would you want to replace him?
The counterfactual isn't Google having average performance. You're crediting the stock performance, revenue growth, and political influence (don't really agree this last one was a place Google shined over this period) to Sundar's leadership; I think it has a lot more to do with the company he was handed.
Maybe we’ll get a do-over with Google.
Then there comes the Google.generativeai. I don't remember the reason but they were pushing me to start using this library.
Now it's all flashy google.genai libraries that they are pushing!
I have figured that this is what I should use and this is the documentation that I should look for, because doing a Google search or using an LLM gives me so many confusing results. The only thing that works for sure is reading the library code. That's what I'm doing these days.
For example, the documentation in one of those above libraries say that Gemini can read a document from cloud storage if you give it the uri. That doesn't work in google.genai library. I couldn't figure out why. I imagined maybe Gemini might need access to the cloud storage bucket, but I couldn't find any documentation as to how I can do that. I finally understood that I need to use the new file API and that uri works.
Yes, I like Gemini model they are really good. But the library documentation can be significantly simpler.
> Property ordering
> When you're working with JSON schemas in the Gemini API, the order of properties is important. By default, the API orders properties alphabetically and does not preserve the order in which the properties are defined (although the Google Gen Al SDKs may preserve this order). If you're providing examples to the model with a schema configured, and the property ordering of the examples is not consistent with the property ordering of the schema, the output could be rambling or unexpected.
My structured output code (which uses litellm under the hood, which converts from Pydantic models to JSON schemas), does not work with Google's models for that reason.
I agree the API docs are not high on the usability scale. No examples, just reference information with pointers to types, which embed other types, which use abstract descriptions. Figuring out what sort of json payload you need to send, can take...a bunch of effort.
once it clicks, it's infinitely better than the AWS style GetAnythingGoes apis....
At some point they updated their privacy policy in regards to this, but instead of saying that this will cause them to train on your data, now the privacy policy says both that they will train on this data and that they will not train on this data, with no indication of which statement takes precedence over the other.
The very generous free tier is pretty much the only reason I'm using it at all
I agree though, their marketing and product positioning is super confusing and weird. They are running their AI business in a very very very strange way. This has created a delay, I don't think opportunity for others, in their dominance in this space.
Using Gemini inside BigQuery (this is via Vertex) is such a stupid good solution. Along with all of the other products that support BigQuery (datastream from cloudsql MySQL/postgres, dataform for query aggregation and transformation jobs, BigQuery functions, etc.), there's an absolutely insane amount of power to bring data over to Gemini and back out.
It's literally impossible for OpenAI to compete because Google has all of the other ingredients here already and again, the user base.
I'm surprised AWS didn't come out stronger here, weird.
That's it.
Also, the AI race is a red queen race. There is no line on the sand that says "you are the ultimate winner", that's not how time works. And given that the vast majority of the internet is on AWS, NOT GCP, and that Gemini isn't even the most popular LLM among AI developers, I'm not sure you can even say that Google is the leader at this exact point in time.
On the other hand the first two approaches from OpenAI and Anthropic are frankly bad. Automatically detecting what should be prefix cached? Yuck! And I can't even set my own TTL's in Anthropic API (feel free to correct me - a quick search revealed this).
Serious features require serious approaches.
Why don't you like that? I absolutely love it.
Wouldn't you want to give more power to the developer? Prefix caching seems like an important enough concept to leak to the end user.
Anthropic require me to add explicit cache breakpoints to my prompts, which charge for writes to the cache. If I get that wrong it can be more expensive than if I left caching turned off entirely.
With OpenAI I don't have to do any planning or optimistic guessing at all: if my app gets a spike in traffic the caching kicks in automatically and saves me money.
>With OpenAI I don't have to do any planning or optimistic guessing at all: if my app gets a spike in traffic the caching kicks in automatically and saves me money.
i think these are completely different use cases. is this not different just from having a redis sitting in front of the LLM provider?
fundamentally i feel like prompt caching is something i want to control and not have happen automatically; i want to use information i have over my (future) access patterns to save costs. for instance i might prompt cache a whole PDF and ask multiple questions. if i choose to prompt cache the PDF, i can save a non trivial amount of tokens processed. how can OpenAI's automatic approach help me here?
... deliver any URLs back, just the domains from where it grounded it response
it should return vertexai urls that redirect to the sources, but doesn't do it in all cases (in non of mine) according to the docs
plus you mandatory need to display an HTML fragment with search links that you are not allowed to edit
basically a corporate infight as an API