Alpha’s Eyes ~ 1. the problem of perception
March 24, 2014
This is the second article in a series in which Alpha, the ani-blogging robot is brought into being. Here we consider how we can convince Alpha to watch Anime. For background please read the first article in the series.
- Perception (theory)
- Programming Frameworks (practice)
- Next Steps
A critical part of many Artificial Intelligence (AI) projects is how perception will be handled. Perception is the generalized term for how the AI will “sense” the environment, or at least the relevant portion as dictated by the problem domain. Alpha will need to be able to watch Anime, which is our concern here. She also will eventually need to perceive when new episodes are available for her to watch, but we will set that requirement aside for now.
Percepts and the Percept Store
A conventional approach to perception in the AI literature uses the notions of “percept” and the “percept sequence”.
We use the term percept to refer to the agent’s perceptual inputs at any given instant. An agent’s percept sequence is the complete history of everything the agent has ever perceived.
A percept sequence can be useful in answering questions like, “Is the temperature dropping?” However it doesn’t suit our purpose as well. As stated in the first article, a design goal is that Alpha be capable of providing analysis that is free from the human tendencies for adjacency and subjectivity. In this case the “sequence” aspect of perception is undesirable, but Alpha will need a memory into which percepts can be incorporated, so that her conclusions will be drawn from her entire history of watching anime, without being subject to Recency Bias. We are not interested in comparing one frame to the next, but rather the episode as a whole .. the series as a whole .. perhaps in contrast to the season as a whole.
As the project continues I will refer to a “Percept Store” instead of a “Percept Sequence” because of this.
So how do we do this? How do we coax Alpha into opening her digital eyes?
I started by looking for a programming framework to leverage that could open files of any video format, was platform-independent, and had convenient language bindings, such as Python, which would make development faster. Unfortunately I was not able to find a suitable framework that met these criteria. The two best alternatives that I rejected were VLC and ImageMagic.
VLC has a Python binding, is multi-platform, and can open just about anything, but seems to be designed exclusively for playback. I could not figure out how to extract data about an individual frame for analysis.
ImageMagic is also multi-platform. Whereas it has all sorts of image manipulation abilities, I found it very difficult to use the documentation to figure out how to do what I wanted to do with it, and I could not confirm which video formats were supported.
In the end I decided I will try to develop the first component on the Cocoa framework, which means that Alpha (at least in this original implementation) will be platform-specific. She will be built on the MacOSX platform, using Objective-C as a language.
PROs: Professional grade media processing frameworks (e.g. Core Video, Core Image), I also have a Mac I can do development on.
CONs: Not multi-platform, also Objective-C is not designed for rapid application development like Python, for example.
If anyone has an alternative recommendation for video / image processing frameworks, please leave a comment below.
I’m still researching the exact framework primitives that are best to use, but I have 4 main uses for my immediate purpose.
- Load video file for processing.
- Select a frame from the stack.
- Characterize color info for a single frame.
- Add info about current frame to the percept store.
So my task now is to refresh my memory of Objective-C, to come up to speed with the latest version of the Cocoa framework (especially media processing), and to construct a first pass at Alpha’s eyes, through which she will watch anime. Hopefully I’ll have some good news to report soon.
Update ~ 2014/03/29
I spent some time digging into the Cocoa framework this week with an eye towards media-related sub-frameworks. It turned out the initial confusion was because there are a variety of options depending upon what platform you are targeting (MacOSX or iOS), how deep into the stack you want/need to go, the exact functionality you are searching for, and some historical stuff that is no longer relevant for new development, but that comes up a lot in web searches. It is now clear that I will be focussing on the AV Foundation framework, which is a much smaller part of Cocoa, and not quite as deep as the Core Video layer. Although there might still be some utility in using some services from higher up the stack, in the Appkit portion of Cocoa, because there are convenient tools for working with a single frame at the pixel level, e.g. NSColor.
With respect to my four use cases listed above:
|1.||Load video file for processing.||AVURLAsset|
|2.||Select a frame from the stack.||Possibly AVAssetImageGenerator. Another option is using AVAssetReader, but I might not need that much control.|
|3.||Characterize color info for a single frame.||… TBD. This is where NSBitmapRep and NSColor might come in handy.|
|4.||Add info about current frame to the percept store.||… TBD|