Sunday, March 18, 2012

Views hits counter for videos in Plone

Some years ago we developed a Plone site that mainly provided video and multimedia contents. One of requirements was to obtain a views counter for videos in a classic YouTube style, but this feature at the end was cut off.
However the idea to obtain this in Plone for collective.flowplayer persisted in my mind (another product in the set of "products that I want to develop someday and probably I will never do because I don't have time").

Recently I traveled from Ferrara to Rome by train, and back forgetting the book I was reading at home, so... why don't take some time to investigate again this old feature?

Literature
First of all: what are the know ways of performing this task?
I took some time to look at the Web; unluckily I don't find some general rules or patterns. I though this not as a "visitors counter" feature: if a user visit the page where we are displaying the video player, it doesn't mean that the user will see the movie.

Now the problem: even if I don't want a perfect system and I don't care if after 100 views the counter will show 90 or 110 (the error will be the same for all views in the site, so statistically speaking, useful), I'd like to create a system where:
  • an evil visitor can spent time to raise manually it's video
  • an evil bot can't raise my counter to 10.000 in few seconds!
Let's move on, step by step on what we can do.

Monitoring the Play of my clip
Flowplayer JavaScript APIs are well done and already integrated in collective.flowplayer, so in Plone. The simplest thing we can do is to rely on the onStart clip's event. After this event has been captured by our callback we can think about sending a call to the server, that mean "the user is seeing the video".

The question: is this enough? If I provided a 5 minutes clip and my lazy visitor only see few seconds after moving away, can I count this as a new video view? Even if my super-fast Internet connection already downloaded and buffered it all?

My answer is "no" (however this can't be taken as "the right answer"). Let's try to think about a system that monitor only if the video has been totally view!

Monitor the end of my clip
The next step is to use also the onFinish event. We can monitor if the user starts to see the clip but also we can check if the clip is completed. In this way we can send to the server our message only when the video has been finished.

This is a step forward, but we can't be sure again that the user really watch the video. He could start to watch at the clip and then move the slider some seconds before the end.

Monitor cuepoints
The two JavaScript events above were already know by me, I used them while developing collective.flowplayer_toolbar.
What I learned looking at the documentation is that Flowplayer APIs also support cuepoints. Cuepoints are a set of events callback that are automatically called every n seconds (let me use 5 as n for our example).

Now we send to the server the final message only if the onStart event has been executed and if all cuepoints also are also executed. To do this we simply use a counter that is increased after every cuepoint execution.
We still rely on the onFinish event, but we can use JavaScript to be sure that the message is sent only if the counter reached an certain amount. This amount must be a value that depends on clip duration but again: Flowplayer APIs contains method to obtain video duration.

Starting to think at the Evil Guy
In a perfect world, where all are Good Guys and no one will ever try to break rules, the general description of the code given above can be enough. But we have Evil Guy.
Who is the Evil Guy? Is a technically-low-level visitor that don't know how to write code or hack JavaScript, but simply try to raise a views counter of a clip. Is a cheater.
How he can do? First of all he can spend a lot of time clicking the "Play" button again and again, every time the clip reach the end (Evil Guy always has a lot of free time).

To protect our site from this type of attack we simply need to monitor that this clip has been already view by that user. When the video starts we can send a call to the server and get from it a generated random token that we keep secretly in the JavaScript environment and on the server itself.
We will not use cookie for this, because Evil Guy can quickly learn how to manipulate them, so we choose to keep this information in the Zope session (this can lead to problems with multiple Zope instances, in that case we probably need some complex RAM cache). We store also the video path, because we want that users still able to look at other site's clip (and raise counters normally on them).

When the video reach the end we still send to the server our message but this time we also send the secret token. Only if the token match we raise the counter.
Another click of the Play button will call again the server, but this time we can see that there is already a token for that clip stored in the session. This mean that the visitor already saw that clip. This time we will stop immediately any other operation.

Even if the Evil Guy reload the browser page, he can't do anything else until the session expires.

The collective.flowplayerclipviews product
Travel from Ferrara to Rome is long, but not enough! All I described right now is more or less what you will find in the first version of collective.flowplayerclipviews. As you can imagine, I'm really far from the target!

Evil Guy gets smarter
Evil Guy can also start learning programming, and so know that JavaScript is a client side language and he can cheat, stopping the boring time he need wait for press the Play button again.
First of all: destroying the browser session is simple as close the browser or erase cookies.
After that he can simply take the secret token from the server and call the server again few milliseconds after, pretend to see the clip!

So now we can start to think about video duration server side. As we said, Flowplayer can give to you the video duration inside JavaScript environment but what about server side? There collective.flowplayer rely on the hachoir suite: inside all multimedia content Plone will store some annotations about video information (mainly: width and height).
Unluckily for us, right not collective.flowplayer is not storing also the video duration (that hachoir supports, but Plone doesn't need) so for now let simply think that collective.flowplayer will do this in the future... and we are using this future version in the rest of this article.

Having the video duration server side can be used to stop Evil Guy from raising the clip counters very quickly. Even if he write a program that take the token (simulating the Play button) and immediately call the server (simulating the clip end), we can also check that the latter request that mean "video terminated", arrives a certain amount of time later the "video started": this amount of time is the video duration!

This will be a short victory. Evil Guy can then start running hundreds of cheating Play operation contemporaneously, wait for the video duration, then send the finish message. In this way he must wait for the video duration, but he can raise the counter of hundreds/thousands anyway.

Can this be avoided? The only way is memorizing the address of the Evil Guy and keeping it in memory for some time. However ZODB is not the place for this kind of temporary data: we can think about storing this information outside, again in a RAM cache environment, or an external database, but we need to keep the write operation on ZODB at minimum.

ConclusionsAfter all those changes we can say that we have a good system... but we need to keep in mind that we can't be sure that the visitor really see our video! He can press the Play button, the go to take a shower! World is not a perfect place.

The collective.flowplayerclipviews right now is only a proof of concept, not to be used in production. If you look at the source, you can see that the _getClipDuration method is not implemented. After that all Plone feature will be there (we still need the external IP address storing structure).

During the time I spent writing this article (commonly it requires me some sessions), I tried again the search of general documentation about my argument. This time I added "plone" to the set of keyword... then I discover that we already have a Plone product that does this task (my fault: I don't googled well)!
I'm talking of collective.piwik.flowplayer! This product is part of the Plumi suite. I already checked Plumi before starting this article, but I miss that feature. Also know that this product is right now deprecated in favor of collective.piwik.mediaelement.

As many other Plumi internal submodules, those products are usable outside a whole Plumi site. Both modules are not implementing the counter feature in Plone but smartly rely on an external software: Piwik.

Piwik is an analytic software, we can say it's a competitor on Google Analytics, based on a JavaScript snippet that you must put in the page.
Apart all other analytics features, looking at the source seems that it is simply checking the Play button pressure... (let me say that I tend to complicate my own life and this is probably the good way). If this is enough for you, I strongly suggest to rely onto this service.

Conclusion (this time, really)
I hope I shown you that having this feature in Plone is possible.