IDEA: More than Internet Movie Database
From transcribing and indexing popular podcasts to analyzing videos and movies for deep semantic search
Hello and welcome to Behind the Mutex. Check out our latest posts:
Oleksandr here 👋
I am starting a new series of entertaining posts about various product ideas. Everybody has ideas. Given the overwhelming amount of information available and flowing around I bet every once in a while you also come up with interesting product ideas on your own. Some of them might seem too radical or complex. Sometimes they might seem too precious to share with anyone. Other times you might even be embarrassed to share them with your acquaintances.
But in reality a truly unique idea is rare. Throughout the history there were many occasions when even breakthrough ideas would be generated fully independently by multiple individuals. Those people had no connection except some exposure to relevant information. Sometimes those individuals would even work in different time periods. Hence certain issues with attributing credits to the genuine forerunners.
Now with 8 billion people and the globally connected world competition is fierce. It is quite pragmatic to assume that your idea is never unique. But what still can be perceived as unique is how you bring your idea to life. Execution is king.
I am going to occasionally post some of my product ideas. I hope that for someone these posts would serve as a source of inspiration or even a starting point to something exciting.
When it comes to gaining some insights and knowledge about health and well-being, I really could not recommend more the Huberman Lab Podcast. It is a gold mine of information. Andrew Huberman does a tremendous job clearly conveying complex topics while keeping the overall theme relatively entertaining. The episodes are quite lengthy, usually spanning over a few hours, and packed with valuable details that you would much like to retain in your memory for as long as possible. I would sometimes take notes with key points not to forget. Obviously I am not alone here as the podcast is super popular.
Then I found something unusually helpful on Twitter. Aleksa Gordić made a website which indexed all transcripts of the entire collection of episodes of our beloved podcast. I presume the indexing was done using embeddings produced by a large language model which enabled relatively sophisticated semantic search. Not only you are able to search by keywords, but also express advanced queries. That was an instant value added as the idea was quite obvious but rather useful.
Now, what this tells us? That given the combination of the new powerful ML models out there you can potentially generalize the solution by Aleksa and even take it to the next level.
Podcasts Niche
This is speculation of course, but there might be other popular and respected podcasts out there that might benefit from semantic search over episodes. But it does not necessarily has to be limited only to transcripts. It could be the entire context of information that certain individuals and groups produce, whether it is their podcast episodes, posts in publications and social media etc. Projected onto Huberman Lab, it could be beneficial to be able to query over Andrew’s transcripts as well as his tweets.
Though the beauty of it is that typically podcasts have little visual information and many just listen to them. This simplifies it a lot. Let’s write a short draft of what needs to be done with no technical details.
You will need to transcribe the audio of the episodes. There are existing solutions to do that, including Whisper, a speech-to-text model recently released by OpenAI.
You will need to generate embeddings of the transcripts and other information such as posts.
Then store the embeddings with the corresponding metadata for further retrieval.
And that’s pretty much it. But when it comes to visual information, things get drastically more complicated.
New Internet Movie Database
Many have heard and probably used IMDB, the Internet Movie Database website owned by Amazon. It is there and it is free. The website has a limited functionality in terms of search and discovery. You can look up specific titles and cast, explore some personal or curated lists, see some rudimentary recommendations, but that’s pretty much it.
You may think that the approach that we applied to podcasts would also work here. To some extent yes, but unfortunately it will be limited as most of information is lost as it is visual. Even if detailed plots were available, it would still be not enough to offer the next-level semantic search capabilities. We could speculate what the queries might be:
Finding movies with scenes that have certain objects in them
Looking up titles where certain actions happen over time, and more
Such advanced capabilities would potentially bring value not only to consumers, but to those who work in production, as it will be possible to quickly validate certain ideas and approaches and have almost instant access to the vast amounts of the content out there.
Recent developments in computer vision brings us closer to such products. Google Cloud offers Video Intelligence API with capabilities to detect shot changes, faces and people, track objects, recognize text in videos and more. Another interesting solution is Meta AI’s Segment Anything Model which offers a generalized image segmentation capabilities.
Recently released Track Anything implements interactive video object tracking and segmentation. This tool is based on Segment Anything.
Of course in such case potential processing operations might be quite massive and costly, depending how one approaches it, but the good thing is that one would only run their analysis every once in a while per title as their underlying models and methods improve.
Exciting times!
If you have any feedback or would like to discuss things further or share your thoughts, please feel free to comment, send an email or DM the author on Twitter @dalazx.