I see my score is 4.18

I really thought "It's an average" would be about the beginning and end of this discussion. You guys are impressive.
 
This article explains one process for analyzing scores. It involves some math.

I'm not that mathematically inclined so I can't attest to its accuracy.

How to Analyze Your Scores
It's good for estimating, but can't nail down exact votes.

The more often you can collect data, the more you can narrow down new votes. I do once a week and that's good enough to see trends.
 
Plus, early votes make more difference to the average than later ones, so trying to track beyond 25 or 30 is pointless.
 
Plus, early votes make more difference to the average than later ones, so trying to track beyond 25 or 30 is pointless.
More precisely, early votes impact the early ratings more than any individual votes (early or late) impact the later ratings.
 
Not really. Someone did a score data crunch a few months back and the mean score (50th percentile) was I think 4.42 if I recall, and my personal avg score of 3.8ish ranked around the 12th percentile, so a 4 would be somewhere around a dismal 20th percentile. No school that I know hands out a B grade for a score like that.

LW aside, anything below roughly 4.2 is a pretty bad score.
The thread is here, and I've pasted the key graph below. It only compares Loving Wives with 'The Rest', but of course the categories are not uniform. The main point was that you can't really compare scores across categories, or even between now (when your story might have been 1-bombed) and the all time scores (where bombs have been cleaned up and voter behaviour may have changed). The median (50%) 'all stories, all time' score was 4.41, but the 12-month LW median was 3.75.

It's all relative. Your 4.18 story may have changed somebody's life for the better, which is sufficient in itself to justify the work.

1748389773531.png
PS - note that there have been some 'sweeps' since this graph was done, so the 2024 story percentiles will have changed, particularly in LW.
 
I really thought "It's an average" would be about the beginning and end of this discussion. You guys are impressive.
I think OP might not have known about being able to see the vote count, awareness of which answers the question "average of WHAT exactly".
 
The main point was that you can't really compare scores across categories

Agree. One cannot compare raw scores across categories. That is (just) one reason why the arbitrary 4.5 Red H threshold is folly. However, if we convert the scores of each separate category into percentiles, we actually can begin to compare stories across categories with some decent accuracy. We could then say that a 3.75 in LW is roughly the equivalent of a 4.41 in category X, etc.

Also, the Red H bar could be set at a meaningful level, such as 75th or 80th percentile. As it is now, in most categories it's somewhere around the 55th. That's preposterously low to indicate some sort of cut above. A second level could also be implemented, say a Red H at 90th percentile and a Pink H at 70th, say.

All or most of this was mentioned in the other thread but it's still very valid discussion worth continuing.

I also think that only the highest scoring chapter of a series should count for Red H (and 2nd highest for Pink say).
 
Agree. One cannot compare raw scores across categories. That is (just) one reason why the arbitrary 4.5 Red H threshold is folly. However, if we convert the scores of each separate category into percentiles, we actually can begin to compare stories across categories with some decent accuracy. We could then say that a 3.75 in LW is roughly the equivalent of a 4.41 in category X, etc.

Also, the Red H bar could be set at a meaningful level, such as 75th or 80th percentile. As it is now, in most categories it's somewhere around the 55th. That's preposterously low to indicate some sort of cut above. A second level could also be implemented, say a Red H at 90th percentile and a Pink H at 70th, say.

All or most of this was mentioned in the other thread but it's still very valid discussion worth continuing.

I also think that only the highest scoring chapter of a series should count for Red H (and 2nd highest for Pink say).
I'm in full agreement with you.

(pause for a moment's reverent silence)
 
Agree. One cannot compare raw scores across categories. That is (just) one reason why the arbitrary 4.5 Red H threshold is folly. However, if we convert the scores of each separate category into percentiles, we actually can begin to compare stories across categories with some decent accuracy. We could then say that a 3.75 in LW is roughly the equivalent of a 4.41 in category X, etc.

As a mathematical moron, this makes sense to me. Thanks.
 
You (PSG and others) are still making the error of fetishizing the category distinctions, granting them undue power and meaning over the percentage bands that they create.

Considering how much pixelated ink has been spilled about the uncountable deficiencies of the category system, I find this very baffling.
 
Ah...4.20 a step in the right direction, albeit a small one. I/T and Femdom....not sure how well that does compared to other genres.
 
You (PSG and others) are still making the error of fetishizing the category distinctions, granting them undue power and meaning over the percentage bands that they create.

Considering how much pixelated ink has been spilled about the uncountable deficiencies of the category system, I find this very baffling.

Actually, we're doing the opposite. Just to make sure that I am clear, the proposition is to calculate percentiles only within each category. If the 80th percentile in say LW is 4.62 and the 80th percentile in Romance is say 4.89, we can't compare the raw scores as an indication of equality, however, we can compare the percentile. Now in the case of LW, the category is quite polarized so no comparison will be truly accurate, but if we compare that 80th percentile 4.89 Romance to the 80th percentile 4.72 in First Time, we will have something that is a fairly accurate equating of rating. As the system is currently there is no equating whatsoever.
 
You (PSG and others) are still making the error of fetishizing the category distinctions, granting them undue power and meaning over the percentage bands that they create.

Considering how much pixelated ink has been spilled about the uncountable deficiencies of the category system, I find this very baffling.

I don't understand why you use words like "fetishizing." It just trivializes the discussion.

I don't fetishize categories, or defend them as conceptually ideal. But they've been around for a long time, and they influence reader behavior, choices, and voting. It makes sense to take that into account. It is indisputably true that a 4.5 in Loving Wives means something very different from a 4.5 in romance, because of observable reader behavior.

I can't speak to your behavior concerning scores, but I can speak to mine, and I think mine is similar to that of many other readers. I don't look JUST at scores in determining what to read, but I pay attention to them. After 20 years of reading hundreds and hundreds of stories, I have a pretty good sense of what I'm going to like. I typically like stories in what I regard as the top 10 to 20 percent, so it's useful for me to have a rough estimate of how other readers rank stories too. A percentile ranking (by category) is far more useful to me than a raw score ranking, or a ranking that doesn't take into account the differences between categories. None of this involves "fetishizing," as you put it.

I think my approach is common. This is the way consumers respond to ratings for online products. They want a system that as accurately as possible ranks similar kinds of things so they can choose.
 
As a pragmatic question for the category specific H levels, does the percentile ever get recalculated?

I am way too new to carefully watching ratings to have a clue, but I would not be surprised if there is a drift over time, and it may be different between different categories. The graph that is posted above (and in a few other threads), might argue that we are seeing rating inflation. Or it might be that the individual scores decline over time. Some of you may have a good handle on that. I certainly don't.

If you recalibrate it over time, do stories keep their calibration from their era or get dragged along. Either seems to have odd behaviors.

If you do not recalibrate and their is a differential shift in the categories, we are introducing whole new unfairness in the H, speaking of something that is fetishized.

And this much harder for the site itself than the simple 4.5 rule.

All this said, I would still prefer the per-category percentile based approach.
 
As a pragmatic question for the category specific H levels, does the percentile ever get recalculated?

I am way too new to carefully watching ratings to have a clue, but I would not be surprised if there is a drift over time, and it may be different between different categories. The graph that is posted above (and in a few other threads), might argue that we are seeing rating inflation. Or it might be that the individual scores decline over time. Some of you may have a good handle on that. I certainly don't.

If you recalibrate it over time, do stories keep their calibration from their era or get dragged along. Either seems to have odd behaviors.

If you do not recalibrate and their is a differential shift in the categories, we are introducing whole new unfairness in the H, speaking of something that is fetishized.

And this much harder for the site itself than the simple 4.5 rule.

All this said, I would still prefer the per-category percentile based approach.

I don't know why it would be much more difficult to constantly update percentile rankings than to constantly update other data. It seems possible to me. I don't think we have any data to confirm whether it would result in odd behaviors or not. I think it would probably result in fewer odd behaviors than the current system we have, which "fetishizes" (Lobster's word, but I think it actually DOES apply in this case) the red H for absolutely no good reason. The red H as an institution incentivizes weird behavior.
 
All I know is the Fetish category is easy mode. I can write whatever crap and it will get a red H as long as it’s at least 3k words or so. The people who don’t like your kinks will avoid your story, and those who like them will score you generously.

There are even stories with literally a 5.0 star rating in Fetish because of this. They just don’t have enough feedback for the leaderboard.
 
Back
Top