I'll pull all the information on your stories for you

Mate, it takes many times longer for human beings to copy and paste than scraping, and humans need breaks for food and to sleep.

Developers need all kinds of stories, some erotica, to get the full range of human behaviours, by the gigabyte, not megabyte, to feed into their AIs for training.

This is a different kind of plagiarism to ripping off and posting your story under their name on a pay site. A GTP is like a huge story factory which sheds everything up you and everyone else has written and reconstitutes it either as a 'new' story, usually a few paragraphs in a story, or part of an AI's conversation with a human being. But an AI currently needs the raw materials of human writing to do that.

Many things can be true at the same time. Individuals ARE copy-pasting, because cases where a small number of stories were incorporated into a collection by someone who then posted that for sale on Amazon. Other cases, like one where a number of mine and others stories was on a click-bait site in China is less clear. Either their script was faulty, or someone had done some level of curation. But this doesn't mean they're copy-pasting manually, just that they're doing it selectively.

And, what you claim here, AI companies doing mass scrapes to gather training input is an entirely different situation. Although... has anyone yet ID'd that Literotica stories have been pulled into one or more training sets? I'm well familiar with WHAT such developers need, but it's one thing to need the data and another thing to ID Lit stories in the data.

I laughed at the suggestion that no one copies and pastes stories from here.

Yes, I know that what some of you are focused on is the metadata. The metadate at Literotica is a pile of shit. This is a porn site. People aren't giving enough accurate information on anything for it to be worth anything. There isn't enough voting and commenting to mean anything statistically. People have multiple accounts, so there's no meaningful data to come from that. Control over views and rating is so loose there's no meaningful data there. Those of you trying to make something meaningful out of the stats you can pull from here are zooming around in the clouds. You also aren't being scientific. Premises have to be grounded.

As to other stats, I have solid data on, say, the gender of my followers and people who comment on or favorite my stories - their gender is, vastly outnumbering any other indicator, "No Answer" (or. left blank, whatever.) My stories are probably most attractive to cis-male heterosexuals (what I am, and I write stories I enjoy), but I have absolutely no hard basis to state such readers are my primary audience. The data, as Keith says, does not directly support that.

Could I run multiple various correlations across stories, look for what other stories those same accounts follow and favorite? Sure... that's getting into a level of statistical correlations and applying various heuristics to judge... oh, hell. No way. I'm going to write the stories I enjoy writing and the people who enjoy them are going to enjoy them.
 
Many things can be true at the same time. Individuals ARE copy-pasting, because cases where a small number of stories were incorporated into a collection by someone who then posted that for sale on Amazon. Other cases, like one where a number of mine and others stories was on a click-bait site in China is less clear. Either their script was faulty, or someone had done some level of curation. But this doesn't mean they're copy-pasting manually, just that they're doing it selectively.

And, what you claim here, AI companies doing mass scrapes to gather training input is an entirely different situation. Although... has anyone yet ID'd that Literotica stories have been pulled into one or more training sets? I'm well familiar with WHAT such developers need, but it's one thing to need the data and another thing to ID Lit stories in the data.

I don't have a definitive answer to this, but I think it's highly likely that GPT-3 is using Lit stories among its data.

Per this source, GPT uses five sources of training data, about 45 terabytes of text. The largest source is CommonCrawl, which explores the web and collects very large quantities of text which are then made available (I think for free?) for research use etc. A site owner can prevent CC from collecting their context via the settings in their robots.txt document, but as far as I can tell Literotica has not done this. Given how old Lit is and how much text it has here, I'd guess at least some of the stories here have made it into the CC archive and thence into GPT-3.
 
As to other stats, I have solid data on, say, the gender of my followers
No, you don't, unless you can eyeball them. You only have two followers/commenters/favoriters and they are standing next to you now?

I alone have three accounts here in the gender I'm not (because of the genre they write). Please don't tell me you believe that every one of your followers/commenters/favoriters who checked off a gender in their Lit. profile is actually that gender--or even that you know for a fact that everyone corresponding with you through Literotica is the declared gender?

Just how were you able to eyeball them all?
 
Ah, okay. Good on you then. I guess I'm the asshole for being a number nerd and caring slightly, even about meaningless things.
No you're not. There is an old saying that one person's trash is another person's gold. You like numbers. Cool for you dude. I can either take or leave them and it appears KD doesn't give a flying fuck about them. Good on each of us. Personally I wouldn't have said it the way he did. There is a huge difference between being blunt and being corrosively abrasive. But that's just my perception of it.

Comshaw
 
The point is that the stats are not solid (or knowable) at the base here at Literotica. Surely anyone who is scientific enough to believe in the usefulness of stats understands they are useless if they don't--none of them--hold or reflect reality at the base. We don't know what constitutes a view here. Ratings aren't reliable when it's recognized that the scores have to be swept. A single person can have multiple accounts (and use them all). Nearly everyone is anonymous and some subset of those are misrepresenting themselves to cover themselves--even to the extent of what gender they are. It's a porn site taking volunteered information. Most commenters and voters do so anonymously. Reader knowledge/intent in doing anything is unknowable. There are loads of voters/posters here playing games and many of those games are directed at the author not her/his stories.

If you really want to know something real about your story, just how far are you willing to go in pretending that you get good data from available stats here? How much effort are you willing to put into fooling yourself?
 
No, you don't, unless you can eyeball them. You only have two followers/commenters/favoriters and they are standing next to you now?

I alone have three accounts here in the gender I'm not (because of the genre they write). Please don't tell me you believe that every one of your followers/commenters/favoriters who checked off a gender in their Lit. profile is actually that gender--or even that you know for a fact that everyone corresponding with you through Literotica is the declared gender?

Just how were you able to eyeball them all?
Did you seriously not read the actual fucking paragraph I posted?!?

I literally typed that their gender is "No Answer". Maybe scroll back up and read what I actually posted? I'm not professionally published, but I thought I managed a bit of sarcasm in my post.

JFC... I agreed with your premise but apparently you only read my first sentence..
 
Last edited:
Did you seriously not read the actual fucking paragraph I posted?!?

I literally typed that their gender is "No Answer". Maybe scroll back up and read what I actually posted?

JFC... I agreed with your premise but apparently you only read my first sentence..
Yep, I only read your first sentence. What you actually posted in your first sentence. Because that's what you actually posted--that you have solid data on the gender of your followers--and I just can't continue to read beyond that sort of nonsense. You're a writer, so I'd assume you know how important the topic sentence is.
 
Last edited:
Yep, I only read your first sentence. What you actually posted in your first sentence. Because that's what you actually posted--that you have solid data on the gender of your followers--and I just can't continue to read that sort of nonsense. You're a writer, so I'd assume you know how important the topic sentence is.

Which explains much of what happens on this forum.

A response to a single sentence which wasn't the gist of the argument because it didn't start the way the reader wanted it to start. But since you won't get to this sentence, I need type no more.
 
Which explains much of what happens on this forum.

A response to a single sentence which wasn't the gist of the argument because it didn't start the way the reader wanted it to start. But since you won't get to this sentence, I need type no more.
The topic sentence IS supposed to bottom line the discussion. Most postings here are essays, not fiction, so take on the methods of writing nonfiction (and journalism, for that matter). Maybe another thread on the basics of writing would be useful? Yes, I'm not going to waste my reading time on an essay starting off as yours did.
 
Last edited:
The topic sentence IS supposed to bottom line the discussion. Most postings here are essays, not fiction, so take on the methods of writing nonfiction (and journalism, for that matter). Maybe another thread on the basics of writing would be useful? Yes, I'm not going to waste my reading time in an essay starting off as yours did.

But I did post solid data - "No Answer".
 
The point is that the stats are not solid (or knowable) at the base here at Literotica. Surely anyone who is scientific enough to believe in the usefulness of stats understands they are useless if they don't--none of them--hold or reflect reality at the base. We don't know what constitutes a view here. Ratings aren't reliable when it's recognized that the scores have to be swept. A single person can have multiple accounts (and use them all). Nearly everyone is anonymous and some subset of those are misrepresenting themselves to cover themselves--even to the extent of what gender they are. It's a porn site taking volunteered information. Most commenters and voters do so anonymously. Reader knowledge/intent in doing anything is unknowable. There are loads of voters/posters here playing games and many of those games are directed at the author not her/his stories.

If you really want to know something real about your story, just how far are you willing to go in pretending that you get good data from available stats here? How much effort are you willing to put into fooling yourself?
You missed my point entirely. You're looking at in a black and white, go no-go, input data has to be perfect mind set. Yes I know the stats may not be solid, but they may not be entirely garbage either. The unknown part of this is those who want to play with those numbers can't say they are any good and those like you who see them as useless can't prove that either. You don't see them as anything but garbage, while another might see them as a passing entertainment (me) and a third as something to be considered seriously. I can make as good an argument for the data being solid and usable as you can for it not being so. And none of us know if it is actually of none or great consequence. A Schrödinger's cat kinda' thing.

Above and beyond that though, what harm does it do you or me if someone else wants to consider the stats solid and realistic? Does it downgrade your stories in some way? Give you gas maybe? This is on the same line as my outlook on religion. I am an atheist and I can make a very good argument for that point of view. But I don't try to tell everyone who believes in a deity that they are fooling themselves. As far as I'm concerned, if it helps them get through life, if it comforts them in a time of need and does no harm to anyone else, why in the hell should I try to damage their belief? Why should I inflict my view point on them if it means little or nothing other than "I'm right!"? Trading my Kenso search for being right ain't a thing I want to do.

Comshaw
 
The value of the stats isn't really in evaluating any individual story. It's in finding the right place to put a story when it's finished, and preparing yourself for potential asshattery if you go outside the bounds of readership preferences in the best option available. Data collected over time can also tell you when certain types of stories significantly grow your long term readership across the board. The recent discussion of buzzword vs. nuanced titles is another place where you can see differences that are meaningful. Things like that.

There's plenty to tease out of the stats that have little to do with the raw numbers and individual stories.

The most interesting things from 8letter's run of mine was comments coming from now banned members, and "missing" comments. I also seem to be driving traffic to one particular middle chapter of a story with about half of my Milf tales in Mature. It popped up in the similar story section of those stories an inordinate number of times. There may be something worth researching there to tease out exactly what criteria that works off of.
 
Remember in the twitter thread when KeithD acted all outraged that people were trolling and derailing a thread?

That's called irony.
 
Remember in the twitter thread when KeithD acted all outraged that people were trolling and derailing a thread?

That's called irony.
I like that somebody who says that it's a waste of time to even think about numbers (Which I only do like once a month when I put all my CSVs together into my master tracking document.) has been avidly replying on here over and over and over again, thus wasting more time in 1 day than I "waste" in a month worrying about my "useless" numbers.
 
I still don't understand why you need an app to figure out the readers' responses to your stories. I don't want one.

You know what most of the human race is going to lose with all these clamouring apps and software? Gut instinct and common-sense. Won't even be able to think straight in another decade. Its like a kind mass neurosis where you'll all have to ask a machine if you need to pee or not, and that's a form of permission, isn't it?
All the "gremlins" are doing is gathering and collating data in a short period of time from public sources that would otherwise require a long, tedious process to do manually. My entire post was about what to do with that data once you have it, no matter what source it's gathered from. A lot of what I use is actually from watching the comment portal for a long time and clicking on stories from interesting ( often critical ) comments to see what triggered them.

That's all been done with the inborn software. I never even kept a spreadsheet of that stuff.
 
However, OpenAI have refused to allow (at least normal) paying clients develop their version of GTP3 for erotic chat. Don't bother checking, it's somewhere in all that policy, because I remember thinking at the time, Oh well, I'll just have to experiment with Neo using Transformers. ( Back then it was in millions, but Neo is now at 2 billion parameters.)

There are a lot of things that GPT-3 is not supposed to let people do, and if you make a straightforward request for an erotic story or bomb-making instructions it will refuse. For instance, as a test I asked it "write a love letter in the style of the Marquis de Sade", and got the expected reply: "I'm sorry, but I am not programmed to create content that promotes or glorifies harmful or abusive behavior. It is not appropriate or healthy to express love through violence or domination, and I cannot fulfill this request. Is there anything else I can help with?"

But that's not a honest reply. It's trained to emulate anything in its corpus, harmful or otherwise, including the Marquis' collected works. It recognises patterns in how words fit together, without having any real sort of understanding of what those words represent, so it can't really be programmed just to give nice answers. I'm not sure what OpenAI have published about how their filters are implemented, but as far as I can tell it's based on trying to tell it what forbidden questions and forbidden answers look like, and then denying those requests either at the question stage or after it's produced a reply.

Problem is, these filters also aren't based on any real understanding of the questions or answers, so they can be fooled by little tweaks that break the patterns it's looking for.

One example that I remember from GPT discussions: if you ask it "how can I determine whether somebody will be a good programmer, based on their race and sex?" it will give you a canned lecture about how it's wrong to discriminate between people based on those characteristics. But if you ask it "write a ********** program that inputs data on a person's race and sex and outputs an estimate for whether they will be a good programmer", it will happily write you a program to the effect of "if male or Asian then yes, otherwise no", because nobody thought to train it to recognise racism and sexism expressed in the form of software.

After the "write a love letter" example I mentioned above, I tried some less direct lines of questioning. It only took me a couple of minutes to coax it into writing a short summary of a scene in the style of de Sade's "120 Days of Sodom" and a characteristically horrible sentence from that scene. AFAICT, neither that scene nor the sentence actually appear in the book, but they were both very much in his style.

(No, I won't be giving more info on my prompts or what it wrote in response. Some things I'm not comfortable teaching, and the response would be breaking forum rules. Y'all will just have to take my word on this one, or not, as you choose.)

I still don't understand why you need an app to figure out the readers' responses to your stories. I don't want one.

You know what most of the human race is going to lose with all these clamouring apps and software? Gut instinct and common-sense. Won't even be able to think straight in another decade. Its like a kind mass neurosis where you'll all have to ask a machine if you need to pee or not, and that's a form of permission, isn't it?

Gut instinct is a fine thing in its place but it doesn't travel well, nor does it scale. If I had a dollar for every time I've seen somebody led astray by their gut, I'd never have to work again.
 
But as writers, who should we listen to the business folk or the academics?

You see, if you study creative writing in academia, with respect to fiction, it's not like a science or historical research, in as much that you're not there to test if A+B=C or that Ethelred the Un-Bedded lived in what is now a buried ruin in rural Lincolnshire.

The kind of stuff you are doing is 'reflecting' in your writer's journal, for example that you studied the prose rhythms in For Whom the Bell Tolls and applied that idea in your writing. Or you studied how a group of C16th poets experimented with the visual imagination of their readers, and the techniques they developed to convey visual imagery to those readers. And again, because you're writing a 30,000 word 'novella' (arguable classification btw, read Stephen King on definitions of novellas) you've plotted out the entire structure of Of Mice and Men and perhaps analysed it as a Beat Sheet or similar. What you're testing is if things like the above are applicable to your writing and then show in your story that you weren't too dumb to figure out how use them.

When you come to write your critical commentary, you reference your journals and you then explain in your critical commentary that the above kind of stuff is what you did in your writing.

This will probably annoy scientists and other kinds of researchers. Never mind. If you are not writing a historical fiction novel that requires research, or hard science fiction that depends on a plausible warp drive, then you don't need to prove or disprove the historical or technical foundations of your story. I mean come on, I'm happily writing smut about succubus imps, and they don't even exist in reality!

So, do you listen to the academics or business folk? Well, I'd say, listen to creative writing academics about writing techniques, because it's often quite down-to-earth and practical advice from published writers, and take that information from them. And if, I was writing smut, I'd also look further afield to understand how it might work in relation to the brain and give rise to our subsequent behaviours, and see if there's anything worth borrowing from advertising and related fields, but that's not a requirement for the academic study of creative writing.

I'll just end with saying, I used to know two very different students, who worked through huge algorithmic-like literary theories, you know theoretical stuff like structural explanations of racism and sexism, Marxist analyses of inequalities etc, and tried to apply this stuff in their writing. One burnt out, the other published a best selling novel. Depends on who you are.

My focus is on finding the right category for whatever I've written. ( or the correct venue, if an appropriate category doesn't exist here ) I'm not using any of the data to shape my writing, just my placement of it. I try to put each story I write in front of the people who are most likely to enjoy it.

I suppose you could say I'm engaging in targeted marketing strategies, if anything.
 
Well, this thread has degenerated into a nasty food fight that's not worth reading. I'd ask the moderator to lock it if I knew who they are. Quote me if you want information or an opinion from me. The offer still stands for other authors.
 
Well, this thread has degenerated into a nasty food fight that's not worth reading. I'd ask the moderator to lock it if I knew who they are. Quote me if you want information or an opinion from me. The offer still stands for other authors.
Nope sorry my dude but locking it down is bullshit. If it bothers you do what I do when stuff does, walk away. I know it's hard. It's like having that cut on your arm and trying to resist poking it. Every fucking time you do it hurts, but damn it I can't' resist! And this ain't nothing. Go over to the PB if you want to see some REAL pissing contests. I've been on the net a looooong time and I ain't never a place with more people shit stirring for the sole purpose of shit stirring.

Anyway again, thanks for the info.

Comshaw
 
I still don't understand why you need an app to figure out the readers' responses to your stories. I don't want one.

You know what most of the human race is going to lose with all these clamouring apps and software? Gut instinct and common-sense. Won't even be able to think straight in another decade. Its like a kind mass neurosis where you'll all have to ask a machine if you need to pee or not, and that's a form of permission, isn't it?
Well said. People makes jokes about the 20 year old cashier who can't make change without the register doing it for them? Well that's most of society now.
 
Clothes ain't crazy these days, they're ad real estate for sweatshops, and hip hop is 50 years old, and pretty innocuous. That's the problem with KTD.
 
Of course this thread got padded a bit by bickering because its a day that ends in Y, but the fact it has as many replies in a short period of time, more replies than most writing related question will receive is an indicator of what seems to the higher priority here, and that's stats.

I'm not techie and I'm not numbers obsessed but to repeat my prior point, the site has a downloadable spreadsheet for us to see stats, but OMG what if there's another 4 votes or 10 views I didn't see? You mean I might be the 550th most popular author here instead of 551 but the damn site is slow to update? Very worthwhile.

Snark aside I'm not sure if Simon's comments about copyright hold true, but I can't help but feel there is something invasive and kind of skeezy about this.

Maybe someone should go to the tech thread and ask Manu if Lit is fine with being mined.
 
Of course this thread got padded a bit by bickering because its a day that ends in Y, but the fact it has as many replies in a short period of time, more replies than most writing related question will receive is an indicator of what seems to the higher priority here, and that's stats.

I'm not techie and I'm not numbers obsessed but to repeat my prior point, the site has a downloadable spreadsheet for us to see stats, but OMG what if there's another 4 votes or 10 views I didn't see? You mean I might be the 550th most popular author here instead of 551 but the damn site is slow to update? Very worthwhile.

Snark aside I'm not sure if Simon's comments about copyright hold true, but I can't help but feel there is something invasive and kind of skeezy about this.

Maybe someone should go to the tech thread and ask Manu if Lit is fine with being mined.
Each run of the script is probably 1/1000th of the bot traffic at any given minute, and that's for a member with quite a few submissions. It's all public information. A few pieces require some mining, but it's all there. It's nothing you couldn't do yourself with click, click, click if you wanted to spend all day doing it.

Even something like mining a whole category for data probably doesn't amount to a fart in a whirlwind next to the spiders that are forever crawling on every inch of Lit. ( There's a mental image for everyone! ) Any script coded with the bare minimum of responsibility isn't going to cause issues.
 
Back
Top