A little over a month ago I got a pleasant surprise: I got an invitation in my email for the inaugural OpenAI Dev Day. I was only vaguely aware that this event was scheduled to happen, and was really only planning on briefly checking it out online, but when I got the invitation, and after realizing that this was kinda a big deal (you had to apply to attend, and getting in was pretty hard), I decided to sign up for it and fly out to San Francisco for the event.
And I sure am glad I went. The conference itself was all day Monday November 6th, but I flew in a few days early to meet up with some friends, colleagues, and many other unsavory character I associate with. It helped that I was staying in San Francisco proper this time around (I usually prefer to stay somewhere along the glorious Peninsula suburbia), and meeting up with peeps was just a short Lyft ride away.
The conference was held at downtown San Francisco, in one of those repurposed warehouse-designed buildings, that might have served some more “classical” industrial purpose at some point. Very fitting that one of the more important staging events for the current industrial revolution would be taking place there. The event venue had a decent amount of security, especially considering any other professional conference that I have ever attended. The security was not intrusive by any measure though, and the event exuded a very enjoyable overall vibe.
The highlight of the whole conference was Sam Altman’s keynote address. I finally got to see him live (and chatted with him briefly later on).
A big pleasant surprise during the keynote was the appearance of Satya Nadella. Another first live sighting for me. He was as cheerful and enthusiastic on stage as he ever was in any of the presentations and interviews that I have seen him in before.
Sam and Satya had a bit of a (staged of course) back and forth about the nature of Microsoft/OpenAI collaboration. The interaction was accompanied by a few (very much natural) nervous laughs on both sides. So make of it what you want.
Here are some of the main announcements and my take on them.
GPT4 Turbo
Just like we had a turbo version of GPT 3.5 available earlier this year, we now have access to GPT4 Turbo. The turbo versions of GPTs are most likely distilled versions of the models, made with significantly lower number of parameters. It is known that GPT 3 has about 175 billion of parameters, while GPT 4 is about 1.5-1.7 trillion. While the exact number of parameters in the distilled models is officially unknown, a few leaks and easy back-of-the envelope calculations based on the speed of inference, strongly suggest that these models are an order of magnitude smaller. Smaller models allow for much faster and cheaper inference, among other things. This greatly helps with scaling them across many users, as well as with the cost.
GPT4 Turbo comes in two varieties: a text-only model and a multimodal text+image model. The text-only model is priced at $0.01/1,000 input tokens and $0.03/1,000 output tokens. GPT-4 Turbo will cost $0.00765 for processing a 1080×1080 pixels image.
While the increase in speed and better pricing are both indubitably great, what’s even more exciting about GPT4 Turbo are the increased context window and a more recent knowledge cutoff date. GPT4 Turbo can process up to 128,000 tokens, or roughly 100,000 words. This is four times longer than GPT4’s context window. Even more importantly, it seems that this new context window is more robust in terms of understanding the long-term content of a conversation or an uploaded document, something that other long-context LLM systems have been struggling with. The new cutoff window is now April of 2023, which will greatly hep with more recent knowledge, although it still has long way to go to make itself truly up-to-date LLM.
GPTs
GPTs are user-defined context-specific instances of ChatGPT, built with a specific use case in mind. They are extremely easy to build using just prompts and a few accompanying documents that users can upload. A less charitable view of them would be that they are just specialized and repackaged domain-specific prompts. But just like with anything else OpenAI related, it’s the UI/UX that makes a big difference. Sharing bespoke prompts and chat instructions was inelegant. Sharing GPT “apps” is much more straightforward and intuitive.
GPT Store
Coming later this month (according to Sam) select “verified” users will be able to publish their GPTs to a special store. This platform will allow creators to share their custom-built GPT models publicly. Once the GPT Store is launched, these custom models will become searchable and may be featured on leaderboards, potentially earning their creators money based on user engagement. The store is part of a broader effort by OpenAI to expand the capabilities and reach of their AI tools.
ChatGPT API
This probably one of the most exciting new products/features to be announced at the Dev Con, and something that a lot of users (especially power users) have been clamoring for for a while. The ChatGPT API is a tool provided by OpenAI that enables developers to incorporate GPT-based models into their applications to create chatbots or perform various text completion tasks. It's a RESTful API that allows for the integration of artificial intelligence to understand and generate natural language within apps. Here's how the ChatGPT API can be used:
1. Creating Virtual Agents: Developers can create virtual agents that engage in natural conversations with users, offering personalized experiences across various sectors such as customer service, online shopping, advertising, education, and gaming.
2. Building Chatbots: The API can be used to construct chatbots that can converse with users and respond to their inquiries in a human-like manner. This involves the bot understanding the user's context and providing responses that are tailored based on previous interactions and other input data.
3. Integration with Applications: Major companies like Snapchat, Instacart, and Shopify have integrated the ChatGPT API into their platforms. For instance, Shopify has incorporated ChatGPT into its app to assist customers in finding items using natural language prompts.
4. Educational Tools: Educational platforms like Quizlet have leveraged the ChatGPT API to create adaptive AI tutors that engage students with questions based on their study materials, enhancing the learning experience through interactive chat.
To use the ChatGPT API, developers need to have a basic understanding of its architecture, including how to apply for and use API tokens. The API allows for seamless interaction with different GPT models through its dedicated interface.
New Text-to-speech API
For a few weeks I’ve had a chance to use ChatGPT in iOS on my phone, and the experience has been eye opening. All of a sudden most other voice assistants seem quaint and inadequate. This feature of ChatGPT is still heavily under development, but even in its current form it is incredibly impressive and a foretaste of real conversational interaction that we’ll be able to have with machines soon. At the Dev Days OpenAI has also announced a new version of their TTS API, which will enable many other developers to build on top of their incredible work.
OpenAI’s Text-to-Speech (TTS) API as part of their audio offering, which allows developers to generate high-quality spoken audio from text. Here are the key features and details of the new API:
Six Preset Voices: The API comes with six built-in voices, each created with the collaboration of professional voice actors to ensure they sound human-like.
Two Model Variants: There are two model variants available: tts-1 and tts-1-hd. The tts-1 variant is optimized for real-time use cases, while the tts-1-hd variant is optimized for higher quality audio output.
Multilingual Support: The TTS API can produce spoken audio in multiple languages, making it versatile for international applications.
Real-time Streaming: It is capable of giving real-time audio output using streaming, which can be particularly useful for interactive applications.
Use Cases: The API can be used in various scenarios such as narrating blog posts, producing spoken audio for videos or presentations, and providing voice responses in apps.
Pricing: The service is priced starting at $0.015 per 1,000 input characters, not tokens, making it accessible for developers to integrate into their services.
Integration with Whisper: OpenAI's Whisper, an open-source speech recognition system, is used in conjunction with the TTS API to transcribe spoken words into text, which can further enhance the capabilities of applications utilizing the API.
DALL-E 3 API
Dall-E is OpenAI’s text-to-image tool/model/library, and it has been one of the first products to squarely put OpenAI on everyone’s radar as a groundbreaking AI company. Dall-E 3 is the latest version of Dall-E, and in a few tests that I’ve done with it seems a big qualitative step forward for the automated AI-generated image creation. After initial integrations with other services like ChatGPT and Bing Chat, DALL-E 3 has been made available as a standalone API. This allows developers to directly integrate this powerful model into their own applications and products, broadening its use cases significantly. To ensure responsible use, OpenAI has implemented built-in moderation within the API to prevent misuse, making it safer to incorporate into various platforms. Renowned companies like Snap, Coca-Cola, and Shutterstock have already started using DALL-E 3 to generate images and designs programmatically for their customers and marketing campaigns, demonstrating its practical utility and the value it adds to commercial endeavors.
Whisper 3
Whisper has been one OpenAI’s truly open source project, and I am excited to learn that they have now launched a new version of it. It is one of the best speech-to-text systems that I’ve tried, and I have even been able to install it and run it on Jetson IOT computers. Really looking forward to playing with this new version soon.
Copyright Shield
OpenAI's Copyright Shield is a protection service that has been announced for users of ChatGPT Enterprise and the ChatGPT API. Here's what it entails:
Legal Defense and Cost Coverage: OpenAI has committed to stepping in and defending its customers if they face legal claims for copyright infringement stemming from the use of OpenAI's ChatGPT. This includes paying the incurred legal costs associated with such claims.
Enterprise and API Customer Focus: The coverage is specifically provided for enterprise-level users and those using the ChatGPT API, not for users of the free versions of ChatGPT and ChatGPT+.
Support Against Copyright Issues: The aim of Copyright Shield is to provide financial support and legal defense for enterprise-level users against copyright issues, which might arise from the AI-generated content.
The introduction of Copyright Shield is significant because it offers a more expansive protection for intellectual property indemnities for AI-created content than previously available. This move by OpenAI could encourage more businesses to integrate AI into their operations, knowing they have legal and financial backing against potential copyright disputes. It also distinguishes OpenAI from competitors who might simply remove copyrighted content rather than offering a defense to their users.
Custom Fine Tuned GPT4s
This an enterprise-level feature that requires far more resources to pull off. You can now custom train GPT4 on your dataset, including your organization’s data. Your data is still safeguarded and sandboxed form the rest of the OpenAI models and services, ensuring your confidentiality and safety. This is not a cheap service - it costs a few million dollars to fine-tune the model, and many weeks to months of training. Seems like this will (at least initially) be a very bespoke concierge-like service, with just a few users at the time.
Things Not Mentioned
One big thing that was not mentioned, or if mentioned downplayed, is, of course GPT5. GPT4 was announced only a few short months after the launch of ChatGPT with GPT3.5, and at that point more or less everyone (myself included) has been under the impression that GPT5-level model is just around the corner, and at the implied breakneck speed of development could even be here by the end of this year. Now, that has obviously not happened, and Sam and others from OpenAI have repeatedly stated that GPT5 is not being trained (or at leas was not being trained a few months ago). It is easy to understand why training a GPT5 in any kind of reasonable timeframe would be prohibitively challenging. By a simple extrapolation in terms of parameters for the largest LLMs we see that different generations are close to a factor of 10 apart. So a GPT5-level model would have to have north of 10 trillion parameters, and in order to be trained in a reasonable time frame (within a year, all other things being equal) would require 10X more compute than the current biggest GPU cluster, or on the order of 100,000 GPUs. So yeah, extremely hard to pull off right now, but probably not completely out of question. OpenAI has also been very clear that they believe that the best current approach is to build on top of the current LLMs, and that we are still only scratching the surface of what we are able to accomplish with them. This more gradual approach is also justified from the perspective of AI safety and adoption, as it will hopefully prevent any overly explosive developments, both from the standpoints of safety and the inevitable and ongoing disruption in the professional world.
Other Impressions
OpenAI has really pulled all stops in terms of hospitality for this conference. Food was excellent, staff was present and ready to chat and answer any question, and the evening reception was very enjoyable and classy. The afterparty, which I did now attend, was also a riot, and featured Grimes as a DJ. (I did chat with her for a bit, and she came across as both knowledgable and curious about AI, and the chip industry in particular.)
Most of the top guys at OpenAI fit the stereotype of weber geeks, and one gets the sense that they are all much more comfortable coding and building stuff than schmoozing with random anons like myself. At one point I overheard Karpathy telling someone that he was going back to the office to work on stuff. :) That’s some real dedication to the work and the mission.
We were all also left with the impression that what we are presented with rn is just the tip of the iceberg of what is coming. Sam explicitly stated that the next Dev Days will make this conference feel quaint and archaic. The other peeps from OpenAI also implied that the next 12-14 months will be truly groundbreaking in terms of the technologies that are being developed, as well as the adoption of the current ones.
Those of us who are deeply immersed in the world of tech and AI are by now somewhat inured to all of its promises, current and in the near future. But just stepping a bit out of our bubbles shows how little of it has still permeated into most people’s lives. Sure, ChatGPT has over 100 million weekly users, but most of that work is still restricted to marginal improvement in executing of everyday tasks. This is all bound to change when all manner of organizations and business pop up that have been built using the AI tools and using AI as the core of their workflows. Those changes are coming. 2024 is bound to be a crazy year.
uh oh I did chat with her for a bit, and she came across as both knowledgable and curious about AI, and the chip industry in particular.
"The other peeps from OpenAI also implied that the next 12-14 months will be truly groundbreaking in terms of the technologies that are being developed, as well as the adoption of the current ones." sure because closeAI folks said so.. besides some numbers and private opinions not much of thoughtful input in here
Thanks for sharing your insights. I remember another post mentioning that GPT-4 was a collection of models and not a trillion parameter model. But you mention 1.5-1.7 trillion. Could you point to some references ?