02 July 2010

Players speak out on ranking algorithm and concerns about cheating

We have received a number of emails from players who have expressed concern about how we rank and also about how quickly some players play the quiz. In the interests of transparency, I offer a sampling of such comments below, and then respond, sharing with you some of the challenges we face in coming up with a fair and accurate algorithm for ranking.

"The scoring algorithm is supposedly geared towards answering questions correctly, however in the rankings, there are a goodly number of people answering wrong but quickly above people who answer slower but correctly. This seems counter intuitive to me; should I just guess quickly the answers in order to improve my ranking? Seems like that would work better than spending time making sure I've got the right answer, yet that's supposedly the opposite of what you're trying to achieve with the quiz!"

"See ranking of quiz taken on 01-Jul-2010, some users have given answer within 6 - 20 seconds. Can this be possible? As I had told you that a user needs some minimum time to read question and then its choice and then we have to answer the question, it will obvious that it will take more to read 01-Jul-2010 quiz. We must find some way to avoid this."

"I still have doubt that still competitors may use two ids and from one they see the question and than they submit right answer of that question using another id and time they take is even less than the time to even read the question. So, i request you that, rather to give ranking with more points based on as much less time, you put one minimum time for each questions. So if user is submitting bellow that time limit, algorithm can consider them as invalid competitors or it can be treated separately."

So to sum up:

1. It would be nice to not reward people for "guessing" (answer quickly, worrying more about elapsed time than correctness)

2. Very fast times to answer a question must indicate that cheating is going on.

3. Set a minimum time for the answer to a question, and ignore answers that are submitted in an amount of time that seems to surely indicate cheating.

4. We need to find a way to stop cheating through multiple accounts.

These are all very worthwhile ideas, and I must admit that we have considered all of these and more in the last few months. It is very hard, we have found, to run a quiz like this over the Internet and completely avoid the possibility of some form of cheating. Beyond that, we also find it difficult to come up with rigid formulas, such as a minimum time to answer, that treats all players fairly and isn't easily circumvented.

Consider the idea of a minimum time. This was actually in my original plan as a way to avoid cheating. We would establish, say, 10 seconds as the minimum answer time. The score for anyone who answers in a time below that will be set to 0. And we wouldn't publicize this fact, because then anyone cheating would simply wait until 10 seconds had elapsed and then press Submit.

But how effective or fair would a minimum time be? Consider:

* We feel strongly that we should be open and honest with you about our scoring algorithm. If we hide key aspects of our rules, then you will not understand the ranking and you will feel manipulated by us. Yet if players know there is a minimum time, it can be circumvented easily by ensuring that one always answers in ten seconds or a little more. Can we perform further statistical analysis to identify patterns that might imply cheating (example: player ABC always answers in 10 seconds. How likely is that?) - but I have found myself very reluctant to accuse someone of cheating (and take unilateral action throwing out all the time they spent on the Challenge) based on patterns and inferences from those patterns.

* Sometimes a question generally  can be answered very quickly or a person happens to know that topic very well and can answer quickly - do we then penalize them for this? Generally, I am less certain now than when the PL/SQL Challenge started that a very fast answer time must mean that the player is cheating. It's a big world out there filled with lots of PL/SQL developers with many different kinds of brains. :-)

* The minimum time really should be determined based on the difficulty level of the question and even the specific question (length of question text and each multiple choice, etc.). That then becomes very complicated for us to manage and measure.

Regarding the concern about the impact of fast answer times overwhelming correctness: this makes me uncomfortable, too. But what to do about it? If we reduce the weighting of score by time, then those who double visit (first to get the question, next to submit the answer after taking their time studying, discussing, whatever) can cheat their way to a high score, not concerned about how much they are taking to get to the answer. Having said that, we can certainly take a look at the impact of changing how much the timing impacts the score, to perhaps find a better balance. The bottom line, however, is that (I believe, in any case) that there will always be the possibility of a higher rank with lower correctness.

So what are we doing about all this? How can we ensure that everyone playing has a reasonably high level of confidence that they are being treated fairly and that the PL/SQL Challenge is worth their time to play?

1. We continue to evaluate and apply changes to the scoring and ranking algorithm based on our analysis of player activity and your suggestions. So please do keep submitting your ideas.

2. The playoff (in which everyone plays simultaneously and you really won't have the time to leverage multiple accounts) should help distinguish those who have played fairly and know PL/SQL well, from those who have played quickly but do not have a deep, solid knowledge of the language.

3. Consequently, we have decided not give out prizes or recognition simply for ranking in the top 10 or 25 at the end of the quarter. We will instead award prizes for top ranking once the results of the championship playoff are in.

4. We communicate with any players about whose patterns of play we have concerns. Some of these players have confessed to cheating and we have wiped their answer history clean. Others have explained in great length the strategies they use to play the Challenge, educating us on the wide diversity of the way that human brains can apply themselves to tasks.

5. We have broadened the rules for participation in the playoff to ensure that a player who ha a high level of correctness still has a chance to play, even if their overall time has reduced their ranking. We also offer a wild card pathway into the playoff to avoid discouraging those players who cannot play every day. See the Rules page for details.

Is it still possible that someone who is cheating will push out of the playoff a player who deserves to be there? Yes. Is it still possible for a person who is cheating to win a weekly or monthly prize? Yes,

Will we get better at minimizing the chances of any of this happening in the future? I hope and believe so - and I am convinced that you can help us in this matter, so please do not hesitate to reply to this blog entry or provide feedback through the PL/SQL Challenge website with your own ideas.

9 comments:

  1. One player had the following suggestion, but could not post it to the blog for some reason:

    For the scoring to be fairer, I think that the number of correct answers ought to have some influence on the time element in the weighted score.

    (T/3) should instead be:

    (T (N-C+1)/3)
    or
    (T/(C+1))

    or something like that.

    ReplyDelete
  2. "* The minimum time really should be determined based on the difficulty level of the question and even the specific question (length of question text and each multiple choice, etc.). That then becomes very complicated for us to manage and measure."

    I think you could calculate the time factor by using the time to answer vs the statistical distribution of the time response. No one can guess, and this should be a good measure to find acceptable time interval for the answer.

    ReplyDelete
  3. Do you think the first rank holder of the current week was really able to read, understand, review the multiple options in merely 6 seconds?

    Lighting speed. Isn't it.

    ReplyDelete
  4. Yes, that is very fast. And it was a long question. You have every reason to wonder. And I have every reason to check in with the player and ask that was possible. So I did that. :-)

    Having said that, there is no doubt in my mind that sometimes people can answer questions very, very quickly. I have seen timings of people I know, and know they would never cheat, that are well under 10 seconds.

    That's what makes it so hard to draw any sort of hard and fast lines on this issue.

    Regards, SF

    ReplyDelete
  5. 6 seconds is a long time. Especially when you're waiting for the microwave to go "ding" :)

    ReplyDelete
  6. Steven, Don't you think all these cheatings are happening just to get those $'s and prizes? How many of these players still play 'daily' if the rewards are removed? The only reason why we are playing to is to get knowledge out of this and the only reason why "they" are playing is to get those $'s.[Some really would be exceptional]

    And I do agree that there is a possibility of answering the question just with in 10sec's. One example is the question on Dynamic PL/SQL where the declaration is made local that should be global. There, just by seeing that declaration we can say its going to throw an exception.

    As you said "How excellent we code and test there will be atleast one bug and the user will somehow get the path to that single bug we missed in testing", the same applies to cheating :-( Hope you got the point.

    ReplyDelete
  7. For me the prizes aren't really important. It's about knowing how good I am. I just don't 'get' gaming the system.
    I see the point of the 'wildcard', but it is very much focussed on the top players. I would like to see something added so that supports that set of players lower down the game when it comes to 'breaks in play'. Maybe something like, if you don't play you get a default score of 75% of the lowest scoring player. Or the score that someone would get for marking all options as incorrect and taking 30 minutes to do so.

    Something that might drop them down the ranks, but keeps them sufficiently in the hunt that they stay in the game.

    ReplyDelete
  8. Gary, very intersting idea. I, too, worry about developers losing interest because they are not and will not be anywhere near the top.

    One other suggestion made by a player that I find intriguing is to ignore the ten worst scores in a given quarter (a no-play score of 0 would count as a worst score).

    What do you think of this?

    ReplyDelete
  9. What I have been concerned about is this: I could very easily select 'Take the Quiz' and then press 'Submit' without even reading the question.

    The quiz takes this as me believing that none of the given answers should be ticked.

    If that particular quiz had four options, only one of which should be ticked, I would be marked as having three of the four answers as correct.

    This would give me a higher ranking than if I had taken the time to read the question and select likely answers.

    How to stop or at least reduce the incidence of this? Add as a final option to each quiz "None of the above"
    or somesuch, and add some validation to the 'Submit' process to ensure that at least one option is selected.

    On the issue of speed of submission, I answered the quiz on Friday 2 July very quickly as it was something I just knew. Today's quiz took somewhat longer!

    ReplyDelete