PennameWombat
Literotica Guru
- Joined
- Oct 5, 2018
- Posts
- 1,339
Mate, it takes many times longer for human beings to copy and paste than scraping, and humans need breaks for food and to sleep.
Developers need all kinds of stories, some erotica, to get the full range of human behaviours, by the gigabyte, not megabyte, to feed into their AIs for training.
This is a different kind of plagiarism to ripping off and posting your story under their name on a pay site. A GTP is like a huge story factory which sheds everything up you and everyone else has written and reconstitutes it either as a 'new' story, usually a few paragraphs in a story, or part of an AI's conversation with a human being. But an AI currently needs the raw materials of human writing to do that.
Many things can be true at the same time. Individuals ARE copy-pasting, because cases where a small number of stories were incorporated into a collection by someone who then posted that for sale on Amazon. Other cases, like one where a number of mine and others stories was on a click-bait site in China is less clear. Either their script was faulty, or someone had done some level of curation. But this doesn't mean they're copy-pasting manually, just that they're doing it selectively.
And, what you claim here, AI companies doing mass scrapes to gather training input is an entirely different situation. Although... has anyone yet ID'd that Literotica stories have been pulled into one or more training sets? I'm well familiar with WHAT such developers need, but it's one thing to need the data and another thing to ID Lit stories in the data.
I laughed at the suggestion that no one copies and pastes stories from here.
Yes, I know that what some of you are focused on is the metadata. The metadate at Literotica is a pile of shit. This is a porn site. People aren't giving enough accurate information on anything for it to be worth anything. There isn't enough voting and commenting to mean anything statistically. People have multiple accounts, so there's no meaningful data to come from that. Control over views and rating is so loose there's no meaningful data there. Those of you trying to make something meaningful out of the stats you can pull from here are zooming around in the clouds. You also aren't being scientific. Premises have to be grounded.
As to other stats, I have solid data on, say, the gender of my followers and people who comment on or favorite my stories - their gender is, vastly outnumbering any other indicator, "No Answer" (or. left blank, whatever.) My stories are probably most attractive to cis-male heterosexuals (what I am, and I write stories I enjoy), but I have absolutely no hard basis to state such readers are my primary audience. The data, as Keith says, does not directly support that.
Could I run multiple various correlations across stories, look for what other stories those same accounts follow and favorite? Sure... that's getting into a level of statistical correlations and applying various heuristics to judge... oh, hell. No way. I'm going to write the stories I enjoy writing and the people who enjoy them are going to enjoy them.