Golf Course Rankings are Mostly Useless

May 02, 2024

Each year or so, the new golf course rankings are announced and I inevitably scramble to see if anything has happened. Almost every time, I walk away from reading them grumbling about how bullshit they are. Oh, look, Pine Valley is on top, maybe I should ready my private yacht and sail to New Jersey. Looks like this year I should phone up my monocled business associate in Monterey and have him pencil me in for a round or two at Monterey CC and Cypress? I’d love to kvetch about a compendium of courses I’ll never be invited to, but to be fair, it has been fun to see Team Keiser creep up the lists in the last decade. But again, I don’t need anyone to tell me I should play at Bandon.

That’s at least the surface reason I love to hate golf course rankings. But all that aside, there is a more fundamental issue with rankings at play here that makes them not just annoying to me, but largely unhelpful to the masses. So let’s talk about why the act of ranking courses in the first place makes the lists mostly useless, and how the ranking system could evolve over time to produce lists that are actually meaningful to most people.

Golf Courses Are Ranked (for Some Reason)

Golf is unique in its ranked lists,1 but you might not even notice it until someone points it out. While restaurants, films, and even National Parks are reviewed with star-rating systems,2 golf course rankings are much more reminiscent of awards shows. Other areas where golf rankings parallel awards is that they are generally presented once a year, the voting is anonymous, and the results are presented with an air of authoritativeness. This framing means there is often no individual to disagree with; it is a consensus opinion. However, there are too many courses for any one individual to review each year, so the results are typically a hodgepodge at best.

The reasons for this quirk of golf are likely historical, but it is arguable that shifting golf reviews to a score-based system would improve their usefulness to the people who actually read these lists. There are some benefits and drawbacks to ranking vs scoring systems,3 and the de facto purpose of the review probably has a lot to do with which system gets used.4 And while there are some organizations that are changing things,5 if you’re looking at a golf course review, it’s most likely attached to a ranking.

Part of the fun of rankings (or a drawback depending on your stance on them) is that lots of people typically don’t agree with rankings when they come out. Often, they can’t even agree on what qualities of the rankings are even worth caring about.6 Values vary from stunning views, to playability, to artistic merit. This gets complicated when these virtues happen to be contradictory.7

What If Everyone’s Taste is Just Different?

The entire framework for authoritative rankings assumes that there actually is a best golf course. Despite all the disagreement, we read these rankings and assume that a groups’ opinion for something being the best makes sense. There is obviously the case for deferring to experts, but what if the experts still disagree? What if all this disagreement can be explained by the fact that people just like different things generally? To illustrate this argument, let’s look at soda pop; something so immediately accessible that even most children have strongly held, informed opinions on the subject.

Coca-Cola vs Cherry Vanilla Dr Pepper

So, what is the “best” soda? I think most people would say Coca-Cola, but I’m sure there is a strong contingent for Pepsi, Dr Pepper, or even Sprite. If by “best” soda, we mean “best selling” or “soda that the most people vote for” then the answer would definitely be Coca-Cola. However, to illustrate the shortcomings of a system like this, we need to understand why Cherry Vanilla Dr Pepper was created. You read that right: this isn’t just Dr Pepper. I’m referring to a very obscure alternative: Cherry Vanilla Dr Pepper. The work of Howard Moskowitz, who created the obscure soda, can show us why something being the most popular doesn’t mean it’s the best.

Moskowitz was credited for popularizing the concept of inter-market variability in the food industry.8 He argues that we should reject the idea that people would prefer a consensus “best” version of something. Instead, he suggests that people just have inherently different tastes for things. His work shows that products can be micro-targeted at individuals by creating lots of variations, rather than offering them the single popular version.

Moskowitz argues that, when it comes to taste preferences, there isn’t one “best soda.” Rather, it comes down to taste preferences, and everyone’s opinions are equally valid. So instead of trying to find “the best soda” to enjoy, we should be trying to find “our favorite soda.” This isn’t a total rejection of expertise (I’ll come back to that later), but it basically means that there are, generally, clusters of people who share similar taste preferences. This taste clustering means that, while it might not be practical to make a soda for every individual, we can get very close to creating ideal sodas by targeting these clusters of preferences. Following this reasoning, when we rank things based on general popularity, we should expect the winners to be something people like, but allowing for niche variations in rankings should lead to multiple winners, but winners that these smaller groups love.

If put to a popular vote, Cherry Vanilla Dr Pepper wouldn’t rank in the top 100 of best sodas. It probably wouldn’t rank in the top 1000. However, Moskowitz’s idea is that there are people out there who do love Cherry Vanilla Dr Pepper, and think it’s the best soda, period. Even if only one person in 10,000 does, that still ends up being a large number of people. Thus, if rating systems are simply popularity contests, they won’t create the most value for the most people.

The Tomatometer and the Critic’s Page

Now, Moskowitz’s results do not mean that there is no value to listening to critics, only that their opinions exist on a plane alongside other, equally valid opinions. Professional critics have the time, resources, and education to appreciate the nuance to things. So even though critics don’t have better opinions, per se, they do have highly informed opinions. Thus, ratings/reviews by critics are still extremely useful to discover things we love. There are two ways we generally encounter critical reviews: through aggregating them or through looking at them individually.

Rotten Tomatoes is a film site that aggregates critics’ reviews of films. Its Tomatometer is a good way to see if most professionals have a favorable opinion of a film.9 However, this only means most professional critics think the film is okay. It’s a good way to avoid films you will probably hate, but it’s not a great way to find films you will love.

The site, however, has critic pages as well. Critic pages show all reviews of individual critics.10 If a film-lover can find an individual film critic they usually agree with, they can use that film critic’s opinions as a proxy for the taste clustering in Moskowitz’s research. When that happens, it’s very easy to find films they will love, and avoid films they will hate. Nothing is perfect, but knowing that you will have a high probability of finding something you will love, that’s made for your kind of quirk, is extremely valuable for most people.

Moskowitz’s Cherry Vanilla Dr Pepper only really appeals to the people who love it. In the same way, the film critic who really loves bizarre, niche films can point eccentric film lovers in the direction of their favorite cherry vanilla versions of film.

Collaborative Filtering: the Holy Grail of Dynamic Rating Systems

There aren’t nearly as many professional critics in golf as there are in film, so finding one that would match an individual's views would be challenging, if not impossible. Still, there is a way to stretch the available information by using collaborative filtering. Collaborative filtering is a type of machine learning algorithm best known as the logic behind Netflix’s recommendation engine.11 It uses peoples’ existing ratings, and very carefully compares them to other peoples ratings, to make surprisingly accurate predictions about the likes or dislikes of things they’ve never tried. While it’s fair to say these algorithms can be crude and are certainly not a replacement for professional critic’s opinions, they can be incredibly helpful. Unfortunately, for collaborative filtering to work, the reviews that get processed need to be scored, rather than ranked.

What Does This Mean For Golf Courses?

I cannot overstate how much I think a clustering-based recommendation engine would benefit the golf world. The parameters that affect the enjoyment of a course vary dramatically from player to player.12 However, establishing such a system would be difficult. As it stands, the historical ranked-rating systems are not much use in creating any system like this, nor is there much way to seek out a reviewer who shares unique views.

The golf world would immediately benefit if all of the ranking publications used score-based rankings, and published the scores of each rater, even anonymously. While I don’t think this will ever happen, I would hope that the newer publications, currently using score-based systems, make that a norm over time.

Eliminating authoritativeness from our golf course rankings would be the easiest change. Siskel and Ebert demonstrated how to do this perfectly with their television show At the Movies. Framing reviews with at least two opinions allows for disagreement to exist. That disagreement creates healthy differentiation the audience can use. Even if it’s a hassle for multiple people to write in-depth reviews, I would hope that publishers at least ask all the critics to provide their opinion even if they can only manage a number or a quick blurb. Finding a good critic you match with is a huge bonus.

—

All of this has created a bit of a quandary for me over at GolfCourse.wiki. I have thought pretty hard about whether or not to include a rating system for golf courses there, and so far, I’ve decided against it. However, I do think building up a collaborative filtering system to help people find golf courses they might enjoy could really benefit the community. As it stands, it would be challenging to implement, and would take a lot of time. I think it would also be difficult to verify user ratings to prevent people from gaming (ruining) the system. I think if I do institute something like that, there will not be any rating posted on course pages. Instead, only a recommendation engine built on collaborative filtering. That way trying to game the system makes little sense. However, I’m still not sure what the best approach would be.

Thank you for reading Wigs on the Green. Sharing this post helps me a lot.

There are many golf course reviewers out there, but there isn’t a central database for reviews like Rotten Tomatoes. Getting one started would be difficult, especially starting one that doesn’t already have ties to vested interest in the industry. The world of golf has a pretty significant blind spot when it comes to reviewing the more humble, but quality courses people have in their communities. People often still need word-of-mouth recommendations. I think this blind spot can be alleviated somewhat with the wiki, but creating a robust database of professional reviews would do a lot of good, and move us away from the mostly unhelpful award ceremonies we see published each year.

Leave a comment

Top 10 “Top 100” lists:

Golf Magazine: the classic.
Golf Digest: the response to the classic.
Golfweek: they have a top 50 list for any topic.
Golf Monthly: the regional list.
Planet Golf: the no nonsense list, and lists of lists.
Golf World: the Todays Golfer list.
Golf Course Architecture: the scholarly list.
Top 100 Golf Courses: the website list.
Links Magazine: the top 10 list lists.
NBC’s Golf Pass: a user aggregated list.

These scoring systems are generally “star rating” systems, typically in a four or five star range. Sometimes ranking thinks some score out of 10, or occasionally some score out of 100.

The National Park reviews are from Yelp, but made headlines when the cognitive dissonance of low ratings of National Parks made headlines: https://www.washingtonpost.com/travel/2024/02/21/national-parks-one-star-reviews/

These scoring systems have become so ubiquitous in society that it seems extremely odd that golf is unique in its tradition of ranking. My favorite example of a self-reflection on the star-rating’s ubiquity on our world is, quasi-satire book The Anthropocene Reviewed, that is obviously poking fun at rating systems, while at the same time presenting beautiful and personal stories that go along with a cliche five star rating.

Benefits of Ranking:

One benefit of ranking is that it removes the need to address imperfection. Rankers need not justify why points are deducted. Ranking allows the celebration of excellence without the need to address weaknesses, and here, rankers can effectively equivocate. If Course A is “flawless” and Course B is also “flawless,” ranking one course higher, can effectively imply that that course is more flawless. Even though this makes little sense logically, there is little need to justify the details of the preference, since this type of ranked preference isn’t held to some platonic perfection standard.

Drawbacks of Ranking:

The biggest drawback of ranking is how limited the information is.

Benefits of Scoring:

Individual scores allow for more information to be conveyed to the audience. Scores allow for hierarchical ranking, but also show the distance between those ranks. Importantly, scored data facilitates sophisticated data use, machine learning models, suggestion algorithms, etc.

Drawbacks of Scoring:

Different types of scoring systems have different qualities. Effectively discrete scoring will not communicate nuance. Effectively contentious scoring can leave confusion about what a zero and max score mean. Often the max score is set to a platonic ideal of what is being scored, which can making their usefulness limited

Generalizing multiple scores to reach a consensus, however, can be challenging.

Ratings as celebrations of excellence (primary audience is typically industry insiders/experts):

When we look at institutional awards: the Academy Awards, the Grammys, Tony Awards, Emmy Awards, etc. We see institutions celebrating the most successful artists of the year. These awards typically take the form of rankings, with one winner, and unranked non-winning nominees.

These awards generally are not given to help consumers decide which films they would like to see, as we should probably see all of them, as all the nominees are generally considered excellent in their field. In addition, the works are often in contrast to each other, where different genres or subgenres are vying for prestige.

Ratings for informing the consumer (primary audience is typically the end consumer, data capture):

When we look at product reviews, we generally see ratings as scores. These take the form of film reviews, with their star-rating systems or their thumbs up or down. These types of ratings generally cover a broad swath of the available product.

Ratings as Marketing (primary audience is the end consumer, or possible owners in the case of private clubs):

Now the inner cynic in me comes out. At the end of the day, golf is a business, and the business of golf is buoyed by the new course being the one that everyone wants to play. Even private clubs, which seem like they would have no need for marketing, open up their doors to the right people at the right time to get on the cover of the right magazines, seemingly under the guise of celebrations of excellence.

While I noted earlier that awards shows are typically for the industry insiders, the reason why I note that it is a de facto purpose is that it’s well documented that studios will campaign strongly for their films to get awards as a form of marketing. There have been some fairly serious allegations in the last year that this is happening in the golf world as well.

Professionals that use scoring systems rather than rankings:

Tom Doak: https://www.doakgolf.com/confidential-guide-to-golf-courses/
The Fried Egg: https://thefriedegg.com/fried-egg-golf-course-ranking-system/

There are quite a few site that aggregate scored user reviews including:

UK Golf Guide: https://ukgolfguide.com
Greenskeeper.org: https://greenskeeper.org
NBC’s Golf Pass (but they still typically present the ratings in ranking form): https://www.golfpass.com/travel-advisor/articles/golf-advisors-top-100-courses-thru-the-first-five-years-of-ratings

Some of the ranking publications provide aggregated scores, but these scores cannot be tied to individual critics (as far as I know):

Note here that historically, Golf Digest has emphasized the challenging nature of a course, Golf Magazine, by contrast, has a much more loose way of choosing their top courses.

A prime example of contradictory values, which has led to many arguments, is overseeding bermuda with rye. Dormant bermuda is an excellent firm-and-fast playing surface, which is extremely desirable in golf, however, it is yellow, not green. This has led many, many courses to plant and grow rye grass during the dormant periods because they care much more about the verdant appearance of their course, even if it’s a softer, slower playing surface, generally regarded as less interesting. This type of contradiction affects even the highest ranked courses in the world.

Howard Moscowitz rose to prominence in the food industry when he used this method to develop thick and chunky tomato sauce. Malcom Gladwell has profiled his career and why it was so influential, both in the New Yorker and at TED:

The breakdown for Tomatometer scores are as follows:

Certified Fresh:
- Greater than 75% of professional film reviews are positive.
- Five or more professional reviews from top critics.
- At least 80 reviews for wide release, and 40 reviews for limited releases.
- It must maintain these stats consistently over time.
Fresh: Greater than 60% of professional film reviews are positive.
Rotten: Less than 60% of professional film reviews are positive.

Tomatometer explainer: https://www.rottentomatoes.com/about#whatisthetomatometer

For reference, the Rotten Tomatoes Critic’s page for The Atlantic film critic and Blank Check host, David Sims: https://www.rottentomatoes.com/critics/david-sims/movies

Netflix used to host a contest for data scientists to improve their collaborative filtering algorithms called the Netflix Prize. While most major tech companies utilize recommendation engines, Netflix and Spotify are the two most prominent companies with engines dedicated solely to aesthetics. It’s worth noting here that collaborative filtering is an implementation of cluster analysis, which is directly related to Moskowitz's bliss point clustering research.

Dozens of parameters exist regarding players’ preferences that can affect enjoyment of a course. To list a few briefly:

Conditioning:
- General presentation
- Green speeds
- Turf firmness
- Turf coverage
- Turf Imperfections
- Grass types
Architecture:
- Dominant school of architecture:
  - Penal, Strategic, Heroic, etc.
- Land movement:
  - Walkability
  - Routing
  - Lie angles
  - Forced carries
- Surface contours:
  - Relationship between contour and green speeds
- Hazards:
  - Types of hazards
  - “Fairness” of hazards
  - Number of hazards
  - Options for avoiding hazards
- Course style
  - Links vs parkland vs heathland, etc.
  - Testing courses vs fun courses vs risk-reward match play courses, etc.
  - Open or narrow corridors
Stewardship:
- Historical accuracy/preservation
- Environmental concerns
- Native flora/fauna
Culture:
- Pace-of-play concerns
- Play style preferences
  - Casual vs competitive
  - Stroke play vs match play vs foursomes, etc.
- Ethical concerns
- Access concerns
- How the course fits in with the local area
Cost:
- Is the course a good value
- Is the course exceptional regardless of being expensive

All of these things could be parameters that are considered in a collaborative filtering program. In fact, clustering algorithms can find parameters that are previously not thought of.

Wigs on the Green