Tournament Format Comparisons

Written by Luke Morgan, Apr 19, 2025

This resource is intended for use by tournament organizers to evaluate strengths, weaknesses, and utility of different tournament formats.

Tournament Metrics:

there are five qualities inherent in every tournament format:

- comeback-ability

- goal similarity

- universal game count

- player encounters

- farmability

Extreme examples that show what each of these mean: (exaggerations for ease of explanation)

- comeback: 100% of people make top cut, topcut is 1 hanchan scramble, highest score wins (all players have as much chance as possible regardless of previous rounds to win overall)

- goal similarity: in every hanchan, at each table, 2nd through 4th are eliminated. 1st place score is reset (all players have exact same goal: get first, every table)

- universal game count: X round scramble, score is based on number of games played with random tiebreaker (every player plays every round , and knows that playing matters and affects final standings)

- player encounter: scramble such that every game starts at south 4 with a ton of rounds, such that every player sees every other player at their table at least once

- farmability: everyone plays everyone in all possible combinations no uma, highest overall points total wins (your score against any other player heads up does not determine your ranking of skill against that player, and you can beat people without having to beat them directly)

Each of these things has a different effect on “player outlook” (player perspective on their tournament result) depending on how that player is doing.

Player Outlooks on Qualities:

Comeback

- high comeback, high rank player: “my previous results matter very little, i must continue to perform well to excel”

- high cb low rank: “doesnt matter how i did before, i just need to do well now”

- low comeback high rank: “i am doing very well, and do not have to worry as much about future performance towards my final ranking”

- low cb low rank “i have very little chance to improve my poor ranking in the remaining games”

Goal

- high goal similarity, regardless of player level: “everyone here has similar goals and in good faith can be counted on to attempt to achieve them” , no “ spoiling” can occur

- low gs: “each player here has different objectives, and that will affect my outcome”

Game Count

- high universal game count, high rank: yay i am playing mahjong, the player skill variance possibilities are high

- high ugc low rank: same as high

- low ugc, high rank: yay i am playing mahjong, the skill of the remaining players is relatively universally high

- low ugc low rank: i am not playing mahjong

Player Encounter

- high player encounter: yay i played against everyone here

- low pe: i didnt

Farmability

- high farm: i hope i play against players that will give me a lot of points and that others do not receive or cannot take advantage of the same chance

- low farm: it only matter how well i do against my direct opponent

Different people value these four things very differently (exaggerations for ease of explanation):

Example player types:

“an open is a way to play people outside my mahjong group, i do not care about my final placement” low everything, but max encounter and universal game count

“i am strong, am confident i will win, and want to play against strong players heads up” low universal game count, low comeback, low player encounter

Issues re: tourney types:

- going to a tournament is expensive in terms of resources (time , money) for some, so making tournaments that cater to those players will get them to come to

events

- unique hurdle to mahjong: games are zero sum and include 4 people per game, so chess/golf/ fighting game solutions are inadequate

-- not 1v1, so people can, from some players perspective, “spoil” games / overall results

-- you take points from others so outcomes are not measured independently

-- players of similar skill have very different outcomes based on pairings (3 high skill players vs 1 low skill player, the 3 high skill players will win -- but be close to each other, and the low will be very negative. 1 high vs 3 low: high will be very positive, 3 low will be close together)

- some players will naturally be at odds in regards to outcomes (“fun times” vs “serious and matters”) cant please anybody. resultant endpoints can be: cater to a chosen player type, solve for “everyone has at least x percent of a good time”, “y percent of people have a good time”, “knowing top z players regardless of player happiness is most important”, etc.

possible survey questions to gauge playerbase desires:

“ for a 2 day tourney x hours away by car, what is the minimum amount you must be certain to play in the main event to consider going” ( lunch day 1, day 1, lunch day 2 which is the historical standard, all of it)

- “whats more important to you: metric 1 vs metric 2” / “given one hundred total points what score does each metric get”

——

tourney settings by metric:

pairings (pairings dont take into account game count)

scramble:

goal similarity low

player encounters high

comeback ability high

farmability high

scramble notes:

scramble randomly pairs people against each other and then adds up the results. it gives you the most possible different number of opponents played. all players have the most possible number of meaningful games played.

scramble is often paired with a top cut to offer some sort of competition at the finals and avoid game irrelevance (if a player has more than an approximate 30+uma lead over the next placement, no resonance game can upset them.)

variance in players is as high as possible, however this does not account for relative strength pairing, and latter variance matters much more than earlier, as goal points levels become clearer. both of these increase farming variance: for example, being near the upper middle and paired against three players in the bottom in the final round (as those players, to make it to the top, have to play for gigantic slower hands) has a very different outcome than being in the same place or higher, and paired against players in the same situation.

in addition, players far enough away from the topcut score line cannot themselves make the cut, but can affect others attempting to do so if paired against them (“kingmaking””).

if maximum universal game count and player variance is emphasized, this is an unsolvable issue: if a player can possible top from any position (comeback), then previous player action is devalued. if not possible but that player is still in the mix, they are relegated to kingmaking. This leads those players to make riskier decisions, most often ending in even a lower placement in the final rankings. a player far enough away from topcut with 1 game to go that needs a 60k first will play for that, but that often ends them even farther from topcut than when they started, which may not be an accurate judge of that players’s skill.

solutions to that on a macro ranking level are to “score” players equally from top to bottom, with no additional bonus score to top players, so that every placement move matters equally. however, should systems count the “Top X games”, this will put players in varying playing conditions: a players with X placements at rank Y will see anything below Y as equally bad, and play as such.

set:

goal similarity high

player encounters low

comeback ability low

farmability low

set play promote heads-up competition and shared goals, so that “spoiling” games occurs at a minimum. it very specifically disincentivizes player variance and universal game count - players can lose early and be removed from the pool of players with the possibility of getting first place, and players naturally will play far less different players over the course of the event, as they are grouped in tighter and tighter groups. depending on “elim count” (double elim, triple elim - the number of losses before players are eliminated from a chance at first) top players may in fact only see the same 3-11 players throughout the event.

apart from the downsides of low player variance, without a detailed ranking system and “losers bracket”, you can underrank players that are unfortunately initially paired against higher strength players.

example, using numbers for strength rankings: 3 is paired early against 1 and 2 in a format that moves on the top 2 players. they are eliminated from a chance at top table in the first round.

while the chance of this is small (given statistical outcomes, 3 makes final table in a top-2 move on situation in all cases until placed at that specific table) , it does not accurately clarify their position.

if your playerbase includes players that quit as soon as they can’t get first, it does not allow them to be adequately ranked. While this can be accounted for with a comeback mechanic (losers bracket, mlg fighting game style), that is game count heavy.

swiss:

goal similarity medium

player encounters medium

comeback ability dependent on cuts

farmability medium

swiss with permanent delineation between a player and first place is merely scramble with multiple cuts and can be analyzed as such.

swiss without permanent delineation adds thresholds that may be considered unfair. for example, being at the bottom of a level gives you less upward mobility than being at the top of a lower level: 8th playing 5/6/7th has a much worse expected outcome than 9th playing 10/11/12th, which becomes more apparent the higher uma is. lowering that uma, however, allows for runaway games disrupting the ranking and increased farmability: there is no reason to get points off of difficult players if points from easy players count just as much.

tourney settings that affect the above

uma: higher uma lowers farmability, comeback. the more uma, the less any "single large game" affects the overall score in comparison: it is clearer to look at it as "what size hand is the placement differences equivalent to". for esample , 12/4 is a non-dealer manga amount of uma, 15/5 is more than a nondealer mangan but less than a dealer one, 30/10 is over a dealer haneman, etc.

cuts: lowers ugc, player encounter. while taking "kingmaking" players out of the pool, it also takes players in general out, which means more duplicate matchups

Oka (an additional first-place-only uma): This adds incentive for players to "coinflip" (taking a 50/50 chance to move up vs down in placement), which increases score variance. For example, a 30/10 uma is 30/10/-10/-30, which are 20/20/20 differences between placements. If you are, say, second, and have a 50/50 chance to move up or down, the payoff is equal.

if you add an oka, however (15/-5/-5/-5), it comes to 45/5/-15/-35, which is 40/20/20, so any coinflip that involves first place is worth it.

Multiron: Multiron most affects players pushing for hands, which is more often done by players in lower placements. This leads to players who are low in points losing more points faster, which increases score variance, particularly in kingmaking matchups.

Reds: Given roughly 2/3rds of all tiles being in play per hand, Each red 5 adds 1/6 of a han per hand (2/3rd shown, 1/4 chance to be in winner's hand). The average set of 3 red fives adds 1/2 of a yaku on average. If we give an average of 2-3 han per hand, that adds very roughly 20k points to the total variance. In addition, it means open / tanyao hands are worth more, which changes offensive/defensive judgements.

Regardless of playstyle changes, assuming players player with similar goals overall, what this does is lead to bigger margins of victory, which in effect lowers that relative importance of uma.

Details Regarding Cuts:

Cuts emphasize / de-emphasize individual games, while removing "strength of player" variance from the remaining playerbase.

If a cut resets the points of players, it de-emphasizes the previous games to 0%. This makes the games about to be played "matter" more.

If you are attempting to avoid "runaway" situations (situations where an individual game has a very low chance of changing effective placement) you can calculate the "effective previous game" quotient.

For example:

- in a two game series, the first game is "one game's worth" of game, so the second game can "comeback", on average, from the results of that game.

- in a three game series, the first two games are "two games' worth" of game, so the third game "comeback" would have to be twice that of the first two to stop a "runaway".

So, if you have a six game series into a two game finals, and after a cut do not erase the points, then there is six games of momentum, with two games to overcome it, which could lead to a runaway finals, especially if the difference between first and second going in is more than two games of range.

In contrast, if you let players only bring forward 1/3 of those points, it is only worth two games of momentum, against the two games of the final - no amount of runaway is possible since the same number of games are available.

----

Plus Zero vs Plus One end states

The choice between ending at time and ending after time is a choice between event stability and simplifying player goals.

When a player knows they are in their final, or possibly final, hand of the game, they can calculate exactly what outcomes can lead to their varying final placements. ("I need an X Ron of of this or that player, or a Y tsumo, for Z place" / "I can only lose so much Ron from this player, that player, or to a tsumo, to maintain my place" / etc).

While at the higher levels, players should cultivate the ability to do these calculations at every appropriate point, players will often only do this when the gamestate "tells" them to - with All Last sheets, or a time call. The more chance a player has to do this, the more that they can determine these clearly and avoid game-spiking "from fourth to fourth" situations.

On the other thand, having an additional game more than doubles the variance of end times of your event. Instead of being up to 1 game worth of delay, now it is up to 2, due to games that end just before / just after the time call. Note that if accounted for, this is not necessarily a bad thing - the wider range of game end times means there is much less chance of pressure on staff checking and collecting scores from tables.