Friday 14 January 2011

The One Eyed Referee

All sports officials are meant to be objective and neutral in their decision making. In theory, therefore, two officials of the same standard should make more or less the same decisions in the same situations. In practice, however, decisions are often made in the context of personal style, the match situation, previous experiences, and (mostly) subconscious biases.

Some sports actually incorporate subjectivity into their laws. In rugby, for instance, the referee is the sole judge of whether or not advantage has been gained. In cricket, the umpire has to decide whether or not short-pitched bowling is unfairly intimidating, taking into account the batsman’s ability. Nevill et al (2002) showed 47 football situations on video to a set of referees. None of them received a unanimous verdict from the refs.

Personal style is undoubtedly a major factor in many decisions. For instance, in rugby, some referees may want to establish their authority early, whilst others may wish to allow the game to flow without being seen to be overly fussy. The referee might be influenced by previous incidents in the game, or even from a previous match between the same opponents. There may also be some benefit given to a side that’s losing heavily.

Trudel, Dionne and Bernard (1999), as well as Gilbert, Trudel & Bloom (1995), in assessing ice hockey penalties, concluded that referees, compared with players and coaches, attached greater importance to the context in which the infraction had occurred. Gilbert et al revealed that both the score and the time remaining in the game influenced decisions about whether or not to award penalties.

Diane Ste-Marie of the University of Ottawa has done a lot of research, especially in gymnastics, into how judgement can be affected by previous experience. She found that if judges were shown a performance on video with an error, they were more likely to see non-existent errors in the second tape. The effect lasted even if the second tape wasn’t seen until a week after the first one, so could affect an athlete performing in several rounds, or even if they’ve been seen in a warm-up. A similar effect has been found in figure skating.

Other assumptions made by officials can be just as erroneous. Damisch et al (2006) showed experienced judges two routines on the vault. If they were told that the two athletes belonged to the same national team, the two routines received similar scores. But when they were believed to be from different teams, the scores were less similar.

There’s also been concern that some judges might be influenced by the opinions of other judges, in so-called “conformity bias”. Boen et al (2008) found that gymnastics judges’ scores were more even when they had feedback about each other’s scores. The effect continued even after feedback was no longer being provided, suggesting that once the judges felt reassured that they had given a “correct” response, they were no longer worried about standing out from the crowd.

The decisions of referees and umpires can sometimes be influenced by prior knowledge of the teams or players (Plessner & Haar 2006). It has been shown that the previous reputation of players can influence the decisions of referees in basketball (Lehman & Reifman 1987), in baseball (Rainey et al 1989). In football, Jones et al (2002) found that players with an aggressive reputation were penalised more severely than players with no such reputation.

A major benefit for a football team playing at home comes from crowd support. It seems that part of this benefit is from the effect that it has on the referee. Dawson et al (2007) found that the home side was more likely to get favourable decisions in terms of fouls and red cards. The amount of extra time allowed is higher when the home team is behind than when it is in front (Dohman 2008, Garicano et al 2005, Sutter & Kocher 2004).

Studies suggest that this home team bias is because of the influence of the home crowd (Bokyo et al 2007, Page & Page 2009), although it seems to have more of an effect of some referees than on others. With some referees, the amount of home bias is directly proportional to the size of the crowd, whereas for others the effect is constant, regardless of crowd size (Page & Page 2009). If there’s a running track around the pitch, the influence of the crowd on referees’ decisions seems to diminish (Buraimo et al 2008).

It could be that referees are simply alerted to the presence of a foul by crowd noise. Nevill et al (2002) found that referees looking at video footage (Liverpool v Leicester City) were much less likely to give advantage to the home team when the sound was turned off compared with when it was on.

Favouring teams or individuals of the same nationality as the official has been found in numerous sports, especially those that are judged subjectively. Clear national favouritism has been found in figure skating (Seltzer & Glass 1991, Whissell et al 1993, Campbell & Galbraith 1996), gymnastics (Ansorge & Scheer 1998, Ste-Marie 1996), ski jumping (Zitzewitz 2006), rhythmic gymnastics (Popovic 2000), Thai kick-boxing (Myers et al 2006) and synchronised diving (Emerson et al 2009). In all these sports, referees have been shown systematically to be giving advantage to competitors from their own countries, and the effect is significant, because the result often determined by the referee’s decision.

Even in sports where the result is not so closely tied to the decision of the referee, there has still been evidence of a national or local bias. Mohr & Larsen (1998) found that referees in Australian football were more likely to favour teams from their own states in matches against teams from another state.

At a national level, football, rugby and cricket usually have neutral officials. Page & Page (2010) looked at two cases where referees were allowed to officiate in inter-club matches where the team from their nation played a team from another nation. These were in rugby league the European Super League 2006 – 9 (mostly British teams with one French team), and in rugby union the Super 14 2009 (5 South African, 5 New Zealand and 4 Australian teams).

In Super 14, a referee with the same nationality of a team increased the score of that team by on average 5 points relative to when there was a neutral referee. The home team won 71% of its matches when the referee was of its own nationality, compared with 50% when the referee was of the nationality of the away team.

In Super League, the French team received on average 9 points more in a match when the referee was not English. They won 67% of their matches when the referee was Australian or French, compared with only 41% when the referee was English. The effect varied according to whether or not the match was televised. When the referee was English, the French team was much more successful when the match was on TV (59%) than when it wasn’t (30%).

The effect of favouritism was most pronounced when the decisions were critical. When there was an English referee, the French team received twice as many cards as the English team, but when the referee wasn’t English they received roughly the same number.

Page & Page also studied the rugby league Championship, which had one French team and mostly English referees. They looked at decisions involving whether or not the ball had been grounded in scoring a try. French teams had a lower proportion of positive decisions (79% against 93%).
Favouritism was also particularly strong when the score was close, with English and French referees favouring the side of their own nationality when it mattered most. In the Championship, when the difference between the sides was 4 points or less, the French team had only a 59% chance of getting a try validated, compared with 82% for the English team.

Because so many of the decisions made by officials are a matter of opinion, such as LBW decisions in cricket, the individual can justify to themselves that they made the correct decision according to the laws of the game, and will genuinely not be aware of their own bias. There are many factors that go into making a split-second decision, including the attitude of the players, the crowd or spectators, the personality of the official, and even their mind-set on the day.

It’s important therefore that during their training, officials are made aware of these factors, and are given as much practice (e.g. videos with crowd noise) and feedback as possible to enable them to improve their consistency. Because one thing seems clear: in many sports, officials’ decisions have a large effect on the result of the game.

David Donner

Are You Blind Ref?

Quite a bit of research has been done on the visual requirements of different sports, but very little on the requirements for sports officials.

There are, of course, several different kinds of officials. For simplicity, I’ll put them in four main groups.

Recorders include cricket scorers, and those who measure distances and times in athletics. Line judges would include line umpires in tennis, but would also include some who can occasionally intervene in play, such as touch judges in rugby. Referees and umpires are an integral part of the game, often making an enormous number of decisions, as in cricket, rugby and football. Finally, judges give a rating to a performance, but are external to it, for instance in gymnastics, diving and figure skating.

Some officials will cut across these boundaries, such as tennis umpires. There are also other officials who are further removed from the performance, such as tournament referees, 3rd and 4th officials.

I'm going to be concentrating on those officials who have to make decisions, but even recorders rely on vision. Cricket scorers in particular have quite high visual demands as they are a long way from the action. Recognising batsmen under helmets from outside the boundary can be difficult, and may require other features to be found, such as the markings on the bats. Scorers also need to be careful that they are ready to acknowledge any signal from the umpire, and aren’t looking down at the scorebook at that moment.

All officials, therefore, should have regular eye examinations to ensure that their judgments aren’t being distorted through any defects in their vision. This should usually include an assessment of their field of vision, as significant defects could result in some of the action being missed completely.

Assuming that the official has a good standard of vision, if necessary corrected with either spectacles or contact lenses as appropriate, we can look at how vision is an integral part of the decision making process for sports officials.

We've already seen how perception can affect line decisions in tennis (see “You cannot be serious” blog). This same effect could also cause assistant referees to flag a player offside who isn’t, and an umpire give a narrow run out decision in favour of the batsman.

Tennis line umpires are in a more or less fixed position, but the positioning of officials can be a crucial factor in their decisions. Oudejans et al (2000) proposed that many incorrect offside decisions in football were as a result of the referee’s assistant being behind, rather than level with, the last defender. The Dutch researchers correctly predicted that this would lead to more wrong calls of offside than missed calls of actual offside. But presumably, this would depend on whether the attacker was on the far side of the defender or near side, as viewed by the official. So it could be that they were actually confirming Whitney’s theory of perceptual delay (described in “You cannot be serious”). Nevertheless, it must be the case that poor positioning would cause errors due to simple parallax, even if they don’t necessarily favour the defender.

One of the things that can help with positioning is anticipation. For instance, if a rugby referee can anticipate a likely drop goal attempt, they’re more likely to be able to get in position to be able to tell whether or not the kick is successful.

Ste-Marie (1998) expert gymnastics judges showed better anticipation than novice judges. All judges were more accurate when they had seen the same performance earlier. If there was some minor difference in the two performances (e.g. bent knees versus straight knees) they were less accurate, and if the second performance had a completely different element to the first, this produced the least accurate judgements.

So whilst correct anticipation can lead to improved judging, incorrect anticipation can lead to inaccurate (though rapid) judgements. Judges therefore need to be able to recognise when a new element has occurred, and take the necessary time to re-evaluate it. They also need to keep an open mind at all times. Ste-Marie found that if judges were shown a move with an error, they were more likely to see non-existent errors when the move was repeated a second time.

A difference that is often seen between experts and novices in sport is how they make better use of visual information in decision making. Expert tennis players, for instance, will focus on certain parts of their opponent’s body when determining the direction of their shot. Is there any evidence for officials doing this?

Well, there is some. In gymnastics, for instance, Bard et al (1980) found that expert judges watched different areas of the body than did novice judges. Sed (2008) found a similar result in basketball. When asked about their thoughts when watching a DVD of a match, expert basketball officials focused on specific areas of the court, whereas novice officials tended to watch the ball. This enabled the experts to anticipate better the movements of the players.

Smith & Millslagle (2008) used head-mounted cameras to assess the gaze fixation of elite and novice umpires in softball. They found that when the pitcher released the ball, the elite umpires’ gaze location was 100% on the ball, whereas the novices’ gaze location was only 55% on the ball.

Considering officials that are an official part of the game, such as rugby and football referees, there are a large number of decisions that need to be made during a game. A study of football referees during the EURO 2000 championships found that they made, on average, 137 observable decisions. That didn’t include all the unobservable decisions (approximately 60), such as deciding not to blow the whistle. Most of these decisions have to be made within time constraints. Australian Football League referees were found to have less than a second in which to make their decisions (McLennan & Omodei, 1996).

Cricket umpires generally have more time to make their decisions, but one exception is in determining whether the bowler has no-balled or not. The umpire has to decide very quickly whether or not the bowler’s feet are legally placed before switching gaze to pick up the flight of the ball in anticipation of a possible LBW decision.

The relevant Law begins “In the delivery stride, the front must land……” When I was training to become a qualified umpire, we had to learn this off by heart, and if our answers didn’t have the words “in the delivery stride” we would lose marks. But when you’re umpiring you don’t have time to think about the words, you have to instantly recognise whether the feet are in an allowed position or not. The same goes for a rugby referee when judging whether a tackle is legal or not.

The training of umpires has now improved, and much time is spent demonstrating feet positions, for example. The idea is that these images are stored in the visual memory, providing an instant comparison to be used on the field of play.

If we are to improve at something, we need some kind of feedback to judge whether or not we are actually improving. One of the few studies on the use of feedback for sports officials was by Jendrusch et al (2002) for tennis line judges. Electronic devices were used to assess where the ball landed. Those line judges who received accurate feedback about their decisions showed marked improvements compared with those who didn’t. They showed no general improvement in perceptual skills, but simply learned what to look at when making decisions. A similar system could be used to improve the accuracy of cricket umpire’s LBW decisions using Hawk-Eye.

If we want to improve the decision making of referees and umpires, it would be helpful to know whether this can only be done on the field of play, or whether video training can also be used. One attempt to do this has been by Brand et al (2009) with the SET - Schiedsrichter-Entscheidngs-Training (Referees Decision-making Training). Videos from a variety of competitions are shown, including the German Bundesliga and the UEFA Champions League. Each video is stopped after a crucial incident, usually a contact between two players, and the participant clicks with a mouse to indicate their decision. This system has been cleverly designed to enable several parameters to be changed, such as how much time is allowed to make decisions, and whether feedback about the correctness of the decision is given immediately or delayed. Extra information can also be inserted, such as shirt colour and crowd noise, to explore the extent to which irrelevant information is used (See “Is the ref biased?”).

They've found that it’s better to give immediate, rather than delayed, feedback on the correctness of decisions. This resulted in the most effective increases in decision accuracy.

One limitation with the SET is that the video clips were taken from external pitch side cameras, so do not give the referee’s actual view. This was also a limitation of a Canadian study in which video clips were shown to rugby referees (MacMahon 1999). The experienced referees wanted to know more information than was provided, and often felt the videos gave them a poor view. One answer would be to fit a referee with a head-mounted camera. Another specific problem was that the referees wanted to see whether there was going to be advantage or not, despite the fact that they were told to ignore the question of advantage.

Another use of video has been in point sparring, which is a scoring system used in some martial arts. The determination of whether or not a point has been scored can be a subjective decision by the referee. Krueger (2008) found, unsurprisingly, that first-time referees were significantly less accurate in their scoring compared with more experienced ones. More surprising, perhaps, was the finding that accuracy was generally less when the standards were higher and the two fighters were more evenly matched. This meant that in the finals, when, at least in theory, the two best fighters met, the accuracy of decisions was lowest.

They also found that a referee with 20 years’ experience was generally no more accurate than a referee with only 2 years’ experience. This suggests that experience without feedback can lead to a plateau effect. Virtually all the referees showed systematic errors that could be easily corrected. One referee occasionally awarded a point to obviously the wrong person, apparently confusing which fighter was which. Another was generally accurate, but would make several errors in a short period, suggesting lapses in concentration.

Just as ProBatter has been used to give England’s batsmen a realistic idea of what it’s like facing Test bowlers, virtual technology should in time be able to give officials experience in an environment where mistakes aren’t too costly. Important factors such as crowd noise could easily be incorporated.

In the meantime, I’m always surprised how little officials are used in players’ practice sessions. Not only would it give novices a chance to develop their experience, but established officials could use it to “get their eye in” before the start of the season.

This would also benefit the players. How many bowlers overstep in the nets? Would there be so many offences at the break down if there were referees at practice games?

A novel way of giving trainee officials relevant experience and feedback has been tried in the Australian Football League. First year umpires are given green shirts, and are monitored by an experienced referee during a game. They may be asked to concentrate on just one call, and may stay on the field for a limited amount of time, giving them a chance to reflect. In rugby union, for instance, a similar system might have trainees concentrating on the straightness of the throw or the spacing between players at the line out.

I’m sure a lot could be learned from time on the pitch with an experienced referee, learning how to look in the right places for likely offences.

David Donner

Tuesday 4 January 2011

Look Black In Anger!

In my blog about wearing different colours in sport, I talked about the advantage of wearing red in tae kwon do, where it appeared to give the perception to judges that the wearer was more aggressive. The suggestion that wearing red was advantageous in football was much less conclusive.

One colour I didn’t mention was black. Frank & Gilovic (1988) found that wearing black in American Football and ice hockey led to a perception of increased aggression, but in this case it was a disadvantage because it led to more penalties against those teams.

They analysed five NHL teams with 50% or more black in the uniforms, and five NHL teams. All of the dark-colour teams were near the top of the group in terms of penalties awarded against them for aggression. Two ice hockey teams that switched to black uniforms during the season saw an increase in penalties.

When two identical American Football games were shown with the uniform colours reversed in the different versions, those watching felt that the black-uniformed team was more likely to be penalised. Turning off the colour on the video eliminated the effect.

There may even be an effect from just wearing black. College students chose more aggressive sports from a list for further competition when they wore black.

It has to be said that the New Zealand All Blacks don’t seem to have suffered too much from this effect. On the other hand, might it be part of the reason why football referees get so much stick?

David Donner

Stoop to Conquer!

It’s been remarked by football commentators that Peter Crouch seems to be penalised a lot, especially by Continental referees. It seems that he may not be alone, because there would indeed appear to be a bias against tall footballers.

Quaquebake and Giessner (2010) analysed more than 100,000 fouls from the UEFA Champions League, German Bundesliga and FIFA World Cups. They found that taller people were more often held accountable for fouls than shorter ones – even when no fouls were committed.

It’s not clear why this should be, but it seems that in close decisions, the player’s height is used as an additional piece of information by the referee. It could be that very tall players are seen as a bit uncoordinated, or as overly aggressive. Either way, it seems to be another of those assumptions that are made about people which, even if they may have a grain of truth in them, can also be grossly unfair to the individual concerned.

Much more on referee bias later.

David Donner