This post is part of the GenAI Series.
So far in the series, we have created AI apps that generate text. In this post, I will introduce OpenAI’s image generation APIs, which can convert a textual prompt into an image and perform many other tasks related to images.
I am going to make an app called AvatarAI. It will generate an avatar based on different parameters. Avatars are very common in the digital world. People use them mainly as profile pictures for forums, depicting their digital version. In case you are just interested in watching the demo, you can see it below:
Prompt Engineering
Like before, we will first create a prompt that generates avatars. This process is called prompt engineering.
The prompt consists of two parts: the data in JSON format, which will be provided by users to define the look and feel of the avatar, and the instructions, which will guide OpenAI LLM on how to use that data. Below is the prompt:
Generate an avatar based on the following specifications given in JSON format, ensuring all details are accurately reflected and styled as described. You must NOT write any text on the generated image
{“demographics”: {“gender”: “male”, “age”: “teen”}, “physicalCharacteristics”: {“skinTone”: “Tan”, “faceShape”: “Oval”, “eyes”: {“accessories”: “Glowing”}, “hairDetails”: “Wavy”}, “accessoriesAndClothing”: {“clothingType”: “Black Jacket”, “clothingColor”: “Metallic Tone”}, “environmentAndBackground”: {“theme”: “City Skyline”, “backgroundDetails”: “neon lights”}, “customization”: {“specialEffects”: “glow”}, “personalization”: {“interestsTheme”: “doctor”}}
Let’s try it on chatGPT first:
Not bad! Now it’s time to automate it.
Development
Assuming you have the OpenAI and Flask libraries installed, you will also need to install the library to fetch environment variables. If not, run the following commands:
pip install openai
pip install flask
pip install python-dotenv
The very first thing I am going to do is build the UI. Based on the above input, I asked GPT to come up with an interface like the one shown below:
Looks cool, No?
We are going to use OpenAI’s Image APIs for this purpose.
from openai import OpenAI if __name__ == '__main__': client = OpenAI( api_key="sk-trS-xxxxxx") prompt = "An ethereal and dramatic depiction inspired by Wordsworth's poem 'The World Is Too Much With Us.' A moonlit sea with waves reflecting silver light, the winds swirling like ethereal shapes. In the foreground, a serene meadow ('lea') with a figure gazing at the sea, Proteus rising from the waves, and Triton in the background blowing a majestic, spiraled horn." response = client.images.generate( model="dall-e-3", prompt="A stylized avatar of a tech-savvy individual with glowing neon glasses.", size="512x512", quality="standard", n=1, ) image_url = response.data[0].url print(response)
Oops! It gave an error:
openai.BadRequestError: Error code: 400 - {'error': {'code': 'invalid_size', 'message': 'The size is not supported by this model.', 'param': None, 'type': 'invalid_request_error'}}
It seems I can’t generate images of certain sizes. Upon reading I found this:
hmm.., so for DALL-E 3 the minimum resolution I can produce is 1024×1024
Now, it produces the following output:
ImagesResponse(created=1733810749, data=[Image(b64_json=None, revised_prompt='Create an illustration of a tech-savvy individual who has a futuristic look. The individual should be represented with glowing neon glasses that suggest their affinity for technology. The aura should be filled with lines of binary code cascading around them. The avatar should have an androgynous appearance to represent gender neutrality and can have any descent. The image should be highly stylized to encapsulate the modern, digital, and innovative essence of the tech world.', url='https://oaidalleapiprodscus.blob.core.windows.net/private/org-hiprYm3IyTNvRn2pAPp9cAPR/user-MCgDqt1tM3GQg7BictRVYOgv/img-OSV5bL9eikrP0uv5FonnDFGP.png?st=2024-12-10T05%3A05%3A49Z&se=2024-12-10T07%3A05%3A49Z&sp=r&sv=2024-08-04&sr=b&rscd=inline&rsct=image/png&skoid=d505667d-d6c1-4a0a-bac7-5c84a87759f8&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-12-10T02%3A25%3A19Z&ske=2024-12-11T02%3A25%3A19Z&sks=b&skv=2024-08-04&sig=DwSbgi/omtXoHiXNdbymcO2Wj3mRt0PxmFgrPUXZ6uQ%3D')])
The ImageResponse
object contains many things here. the parameter we are looking at here is url
. It contains the URL of the generated image. You may also see the revised_prompt
parameter that contains an internal prompt produced by GPT for internal usage.
Now, I am not writing every bit of the code here, as I will share it anyway. Upon running the code, it generates something like the output shown below:
Conclusion
You see how easy it is to create an AI art generator app in Python using OpenAI APIs. The only thing that can hold you back is your creativity rather than the tools. You can customize this app as much as you want. By simply changing the prompt, you can transform this app into an AI logo generator app. The sky is the limit. As always, the code is available on GitHub.
Looking to create something similar or even more exciting? Schedule a meeting or email me at kadnan @ gmail.com.
Love What You’re Learning Here?
If my posts have sparked ideas or saved you time, consider supporting my journey of learning and sharing. Even a small contribution helps me keep this blog alive and thriving.