Since long years we’ve worked on text-to-speach-applications and -scenarios. In the following video you can see, how (good) this is working right now. In this video by Marques Brownlee we see examples of something even more astonishing: natural language text-to-image-applications.

Do have a look yourself:
If you want to try it out yourself, you should look for DALL-E mini on GitHub or migrate directly to
Here are the results of the first own trial sessions:
"wolf siegert":
"wolf playing jew’s harp" [1]:
no text entry at all:
Here are the first two lines of the latest poem - in German:
Any comments?