KarperScore Redux

By KarPer

In my earlier post, I provided a quick-n-dirty solution to the problem of devising a superior system of scoring songs played in Amarok. After giving the problem some thought, I have come up with a different approach.

The score of a song represents the computer’s best guess of how much you, the user, likes a certain song. Unlike the rating, the score requires little user input. Of course, if the rating is provided, the computer can predict the score a lot more reliably. In any case, the parameters that the computer can track and use in determining the score of a song are as follows:

  • the previous/current score of the song (prevscore: 0-100)
  • the rating of the song (rating: 0-10)
  • the play count of the song (playcount: 0-inf)
  • the percentage of the song listened to now (percentage: 0-100)

How important are each of these parameters in judging the score of the song? The prevscore is our current best guess and needs to be refined. The rating is the absolute indication of how much you like the song. But over time, it is likely to be obsolete – the user may grow to like or dislike the song over time and the rating might not be relevant over time. The playcount indicates how obsolete the user’s rating is likely to be. If the playcount is high, it is possible that despite a low rating, the user does indeed like the song quite a bit. Finally, the percentage of the song that was just played is an important, though limited parameter in our decision – it’s altogether likely that the user had to stop the song because of an important call, for example.

In what follows, the code is provided in the Ruby programming language.

Let’s start with the prevscore. It’s our initial best guess of how much the user likes the song. If the song was never played before, the score would be returned as a default value, 0. Just to make sure it is so, the first if statement sets this:

if ( playcount <= 0 )
prevscore = 0

We need to next consider the rating of the song. Again, it’s possible that the rating is not present for the song. In that case, the computer must make a decision. Conservatively, I start it off at a rating of 5 – that’s two and a half stars.

if ( rating <= 0 )
rating = 5

The choice of the ‘<=’ logical operator is inspired by the default Amarok scoring script. Now, we are ready to make our first guess at the new score.

guess1 = ( 5 * rating ) + ( prevscore / 2 )

In other words, the rating and the prevscore are averaged. Let’s examine the stability of this guess. It’s immediately clear that this algorithm will always approach the condition prevscore = 10*rating. This was the basis of my first solution.

To improve our guess, the playcount must be considered. If the playcount is high, it should reasonably imply a higher score. This guess must be bounded by the user’s rating and the maximum, 100. The best function to execute this gradual move from the user’s rating to 100 is the exponential decay function.

guess2 = guess1 + ( 100 - guess1 ) * ( 1 - Math.exp( -playcount / 100 ) )

The playcount is divided by 100 to make the typically fast exponential decay a hundred times slower. Over hundreds of plays, the song’s score will now drift away from the user’s rating towards 100.

Finally, the percentage variable is brought in. This variable has the potential to wreck havoc on any algorithm – indeed it was the primary reason why I embarked on this endeavor in the first place. On the other hand, we must recognize the potential to introduce a little ambiguity in the scoring process. It does become boring to stare at a score forever pinned at 99. A little disturbance in the Force might be a good thing from time to time. :) In order to contain the damage this variable can do, I let it have control of 10% of the current best guess – guess2. If the song is immediately skipped, the score will drop to 90% of its current value.

guess3 = guess2 * ( 0.9 + 0.1 * percentage / 100 )

A final look at the stability of this algorithm is warranted. Initially, the prevscore will tend towards the rating, barring the effect of sub-100 percentage values. As the playcount increases into the hundreds, the exponential function will invariably take over and take the score towards 100. The percentage can disrupt this from time to time, but the score will always return back towards 100.

This, then is the thought process that went into the design of KarperScore 2.0. It’s currently in testing. I’ll release it as soon as I’m sure it’s working as I expect it to. The future improvements to this code should let the user decide parameters like the default rating, or the relative importance of the percentage variable to the score, etc.

Update: What a difference the class of the variable makes! Many of the variables above, such as playcount were initialized as integers and the exponential function likes floats! So, the score was jumping all over the place. Now, all variables have been initialized as floats and that solved all the bugs I noticed with the script during testing. The script is ready for general release as far as I can tell, but a couple of days more of testing the code is always a good idea.

Update 2: KarperScore has been working so well for me over the past couple of weeks that I’ve been using it that I wish I could reset all my playcounts and scores, to start all scores fresh. I need to ask someone on the Amarok IRC channel (#amarok on irc.freenode.net) if that’s possible at all…

I did a simulation of how the score of a song varies over one hundred plays, assuming that the user sets a (constant) rating when the song is listened to for the first time. That’s why the scores always start at 25 for each of the three cases – no rating (blue), full rating (red) and minimum rating (green). I also assumed that the user listened to the song in its entirety everytime it’s played (percentage = 100).

KarperScore

Clearly, for an unrated song, the score approaches 100 leisurely, making it to about 88 after the user listened to the song for the hundredth time. It’s a good guess of how much the user might like the song after so many plays. If the rating is specified, the score tends faster or slower than this. The fastest rise is for the five-star rated songs – the score passes 90 if the song is played just four times. The lowest minimum rating possible is a half-star. Note how the score approaches the rating over the first few plays and then slowly climbs as the song collects more playcounts. When played for the hundredth time, the score reaches about 80.

18 Responses to “KarperScore Redux”

  1. En-Cu-Kou Says:

    I’ve just gotten your script, so I’can’t tell yet how it’ll turn out for me, but here is a little idea for you:
    Maybe you could use the information about when the song was added to the database. If I was listening to my favorite band over and over for a year, the playcount would go up and so would eventually go the rating for each of the songs. Now suppose they’ve released a new album. Even though I’d give the same ratings, the scores would be lower not because I’d listen to it less, but beceuse I just didn’t have the songs that long.
    So, maybe it would be good to replace the 100 you divide by with something like playcount/months_in_collection*some_factor+some_constant…
    I don’t know if Amarok can tell the script the time the song was added, though.

  2. KarPer Says:

    Thanks for trying out my script! You’re one of about 60 who did so far.

    I’ve been using it for for a past couple of weeks now and it works exactly as I imagined it would when I started on this project.

    I don’t think that the score of a song should be affected by how long I’ve known the band. I might love a band to bits and not feel their new song is up to their usual standard. The ultimate test should be how often I play their song and how highly I rate it. Please refer to the second update above for a plot of how the scoring script handles the score over a hundred plays of a song. Hope it explains it better!

    Also, as far as I could tell from the documentation, there’s no way to get Amarok to report the “date added” information of the currently playing song. In any case, I think it should be immaterial as far as the score of the song goes…

  3. Joakim Says:

    Just downloaded the script and it seems to be much better than the default one. Too much of Amarok, any media player really, is aimed at people who listen to singles. In that case I think the default rating is perfect.

    But I don’t. I listen to albums. And I’ve got all my songs in the playlist, so if I do have shuffle on, there’s loads of (good) music I just shuffle through as I’m not in the mood for it.

    I’ll try this for a few weeks and see how it is!

  4. KarPer Says:

    Hope it works well for you.

    As for your other point, actually, I felt that Amarok was very nicely balanced in dealing with users with album-centric collections like Joakim and me while not pissing off users with single-song centered collections. Up until the time you get to the playlist, you’ll notice that songs from the same album are often grouped together, like in the collection pane or the context browser. I’d definitely like to see this to be extended to the playlist view as well. Maybe in Amarok 2…:)

  5. Markus Says:

    Hi, there.
    I really like your little script but I still have a question regarding its behavior if one skips a song after 3 or 4 seconds. In your interpretation, “if the song is immediately skipped, the score will drop to 90% of its current value.” (I particularly like the way you managed to get the PERCENTAGE variable under control)
    Is this a meaningful thing to do? Does it really mean that you don’t like the song or that you are not in the mood for it? Should these mood swings influence the score of the song? I, for example, use my infra-red remote control to randomly skip through a rather long playlist while reading on the settee. I skip because I don’t know what the next song is, since I’m not directly in front of the monitor.
    Your interpretation works only for short predefined playlists, where one knows what to expect.

    So, my suggestion would be to keep the old score if the PERCENTAGE variable doesn’t overstep 2% of the song (For my collection, this corresponds to 4-5 seconds on average)

    My second questions refers to whether you’ve managed to find out how to reset the existing scores for your collection. I’d really like to start from scratch.

    Ta.

  6. KarPer Says:

    Glad you liked my script.

    Regarding how the percentage of the song listened affects the score of the song, I think it’s just my solution to the problem. In other words, my opinion is that moodswings should affect the score of the song. The score is really an indication of how much you like the song right now. If you hate it, the score should reflect that. If you really like it, but you’re just not in the mood for it right now, the next time you listen to the song, the function will more than correct for your hasty dismissal of the song the first time around.

    In other words, over time, the script just works!

    If you have a different solution that you’d like to implement, go right ahead. (Here comes the beauty-of-open-source speech…). Just head to ~/.kde/share/apps/amarok/scripts/Codes/KarperScore/karperscore/ or equivalent and edit the file called karperscore.rb. You can change the formulae anyway you like. For your case, seems like you want to put the third guess within an if-then statement which will then execute only if, say the percentage is more than some value (if you skip quickly, the score remains unchanged). Then, your final guess is guess2, not guess3.

    About your second question, the answer is sadly no. Short of hosing your library file completely, I don’t have a way to reset your scores. It should be possible if you know how to play with databases(sql and the like), but I lack the necessary know-how…

    Luckily in my case, the gutsy update completely hosed my feisty install and I was forced to reinstall. So, I got to start fresh, but something like that is just too drastic to recommend. :) If you want to start from scratch, delete the .kde/share/apps/amarok directory and that will make your player forget everything about your library. If you want to hose your preferences too, delete the amarokrc file in .kde/share/config directory as well. (Conversely, these files are the same ones to back up if you want to restore Amarok completely on a fresh install :) ) Amarok will happily start fresh the next time you run it.

    Hope this helps! Rock on!

  7. Markus Says:

    I was fully aware of the fact that I can easily modify the script to fit my needs since your inbuilt notification contains all the necessary information. I just wanted your opinion on how a listener’s mood swings should influence song scoring. I guess I was hoping you’d prove me wrong or something.

    Now that it’s become clear that we’re dealing with purely personal interpretations of the problem, I have changed your script to fit my needs and it appears to work just fine.

    I have also managed to reset my scores and playcounts while keeping my ratings, so that I can start fresh with the new version of your script: migrated from SQLite to MySQL, then issued an UPDATE command for the table statistics from phpmyadmin. This also resulted in an unexpected performance boost for Amarok.

    So… thanks again for your script and keep up the good work!

  8. KarPer Says:

    Cool. Enjoy.

    The performance boost from using MySQL over the default SQLite shouldn’t come as a surprise. I recall reading an article on FOSSwire that actually encouraged users to start using MySQL…

  9. Markus Says:

    Oh, I’ve made another slight alteration to your script. (Seems like I can’t stop fiddling with it.) You might even find it useful (or at least some of your readers).

    Quite a lot of tracks have a silent part at the end. Being rather impatient, I tend to just skip to the next song. Practically, the song has been played in its entirety but Amarok doesn’t know this and will assign a lower score to it on account of me having skipped those last silent 5, 7 or 10 seconds.

    So I did the same thing I’ve done for the incipient section of the song, but “in reverse”, so to speak. If one has played 97% of the song, the percentage variable is artificially upped to 100 and the rest of the calculations are done with this value.

    Just thought I’d mention it, so that this minuscule tweak doesn’t forever rot in the bowels of my PC forever.

  10. KarPer Says:

    The last 97% of a song leaves out 3% of the 10% of the score it can impact, which is less than or equal to 0.3. It is so small, it’s probably not going to make that much of a difference! :) Remember, the percentage variable is on a leash now. It can’t do too much damage. So, feel free to skip or stop songs whenever you feel like.

    Rediscover your music, as Amarok says, and let the script worry about the score of the songs!

  11. Markus Says:

    You are absolutely right. 0.003 of the score is probably lost during typecasting. I was totally obsessed with the score. Screw it. I’ll just listen to the songs and check out the scores later. That way they will be objective.

    I’ll just listen to whatever I’m in the mood for!

  12. Linux: Tweak Amarok score algorithm >> Scott Klarr Says:

    [...] was about to come up with my own algorithm when I came across This KarperWorld Post where he came up with a very nice scoring algorithm. I modified the scoring script with his [...]

  13. Texoft Says:

    FYI: How I reset score for all songs from within Amarok

    1. insert your songs into playlist
    2. right click “edit track information”
    3. go to statistics tabs, set score (don’t change anything else or you’ll break your tags!). I had to use score 1, strangely zero didn’t work for me.
    4. click on save
    5. wait a lot

  14. KarPer Says:

    Oh, but it’s not an elegant solution!

    Setting the score to 1 isn’t what I want (I know, I’m being finicky, but indulge me!). Also, with more than 4000 songs in my collection presently, that’s a lot of waiting!

    So, since all my tag info is saved with the file itself, the best thing for me to do was to nuke the db file that Amarok uses to store its data. Then, rescanning the collection effectively reinitialized the library.

    But, yes, for those with smaller collections who don’t mind a small, but non-zero score, this method is perfectly okay.

    One final concern, though. Messing with the track information applet often writes an ID3v1 tag in addition to an ID3v2 tag. Since I obsessively delete my v1 tags, I am scared to change anything, lest it writes in new v1 tags with their limited support for long song titles etc. Yeah, I’m that finicky about my music collection. :)

  15. Fabian Says:

    HowTo reset amarok’s scores (backup your collection.db first):

    1) close amarok
    2) sqlite3 ~/.kde/share/apps/amarok/collection.db
    3) type “UPDATE statistics SET percentage=0;”
    4) quit and launch amaroK

    This worked for me.

  16. Fabian Says:

    Another comment: The database field is called percentage, but it’s apparently the score, since my command worked and I think the last play-percentage of the song isn’t saved in the database.

  17. elyk Says:

    One suggestion for a future version (if you’re planning on making one) would be to factor in the playcount as a percentage of the total number of songs played, rather than as an absolute value – 100 plays out of 1000 total has a different meaning than 100 plays out of 1000 total. Ideally, I’d also like to see a songs score gradually fade after not being listened to for a while (which would likely indicate that the listener doesn’t enjoy it as much as they used to), but I’m not sure how that would work as score processing occurs when a song is played, and a periodic update (loop through all songs and update their scores based on their playcount percentage) would be impractical with a large collection.

  18. Michael Mortimore Says:

    Hi,

    I tried your method and it seemed to be working well. I made some changes which I thought you or other readers may find useful.

    I often leave my music playing over night or walk away from the computer forgetting to pause it, so I found that with the default script, most of my songs had high scores, whether I liked them or not. While your method would have resiliance to this effect, it’d still happen over time. To counter it, I added the following:

    sleepmax=10.to_f
    # count how many tracks haven’t been skipped
    sleepcount = sleepcount + 1

    #see if track was skipped early
    if( percentage < 99 )
    sleepcount = 0
    end

    #average of previous score and new score, weighted according to the
    #sleepcount. The longer it is since i skipped a track, the more likely
    #i’m asleep so the less we’ll change the score.
    guess4 = (guess3 * (sleepmax-sleepcount) + prevscore*sleepcount) / sleepmax

    #only change the score if sleepcount is within limits
    if( sleepcount <= sleepmax )
    system( “dcop”, “amarok”, “player”, “setScoreByPath”, URI::decode( url ), guess4.to_s )
    end

    Of course it doesn’t stop the playcount incrementing, but i’m hoping it’s close enough.

Leave a Reply