Thread to ask technical questions about AI

AG31

Literotica Guru
Joined
Feb 19, 2021
Posts
4,564
I was happy to see in recent threads that there are people in AH who actually have valid technical experience with AI, who have worked in that programming environment. I was happy because I've been trying to understand what's going on under the covers. I have a programming background, but it started with Basic, when Dartmouth had just released it, and extended through PowerBuilder, ending in around 2005.

I've tried getting info from ChatGPT, but I'm beginning to suspect that the reason its answers aren't enlightening, is that it may be hallucinating.

I'm hoping people like @nice90sguy might take on my ignorance, and maybe others will be interested, or have questions of their own. I know there were a few other AHers who might be able to help out, but I couldn't find them again.

Why in AH? Because there are smart people here and there's obviously an ongoing interest in this new phenomenon that frequently impinges on the world of language and writing. And because I don't have another place to go.
 
So, I wanted to know, at a granular level, what goes on once you submit a prompt to an LLM agent. (Is that the proper terminology?) I asked ChatGPT several times in different ways and got that the first thing is that the prompt is broken into tokens, maybe words, maybe parts of words. Each of those tokens is run against a huge list of associated (words?).

I get stuck then on how breaking the prompt into such tiny pieces can lead to an answer. Can't get my head around the next step. Can someone help?
 
The very first step is a Markov chain. This is where you take a text and count the words. Say, Pride and Prejudice, or your last story. Could be anything. 'The' occurs 2371 times, 'and' occurs 1894 times, and so on. Randomly pick a word to begin. As 'in' occurs quite often, say we randomly picked 'in'.

Now a Markov chain looks at all the two-word phrases: 'in a' and 'in the' commonly occur in the text, 'in time' less so, and so on. Randomly pick one of them. So it's likely to be 'in a' or 'in the'.

Continue. Keep on choosing a likely next word.

This actually works. I've written a simple Markov chain program that only looks at two-word pairs, and it produces surprisingly coherent English mixed up with incoherent nonsense.

The LLMs do this in a much, much more complicated way, but at base it's the same idea: look at a lot of English and see what is likely to come next. So 'of the' and 'in a' and so on keep on coming up,
 
Part of the way it predicts which word comes next is a concept called "embedding".

Imagine you have a great big whiteboard, and you're writing every word you know on it, with the words that are similar grouped together. So "dog" goes near "cat", "puppy" goes near "kitten", and so on. And we try to keep similar relationships consistent: if "kitten" goes a few inches above "cat", then "puppy" goes a few inches above "dog", like so:

1776203601566.png
As we add more words, we try to preserve these relationships. So for instance, "baby" and "child" are to "adult" as "kitten" is to "cat" (but "child" is maybe a little closer to "adult") and "girl" and "boy" go nearby:

1776204164593.png
If you wanted to add "man" and "woman" to the whiteboard, you'd position them with "man" below "boy" and "woman" below "girl" in the same kind of way.

Somebody who doesn't know the English language, but understands the general idea of how you're organising things on that board, could then look at it and make conclusions like "dog is to puppy as cat is to kitten and adult is to child". So if they see a sentence like "the mother cat looks after her kitten", they can figure out that "the mother dog looks after a puppy" is also a plausible sentence - even if they have no idea what these words actually mean, and even if they've never seen "looks after a puppy" in a sentence.

LLMs do something similar, but they don't start out with an understanding of what a "cat" or "kitten" or anything else is - they're just looking at a huge set of training data and spotting patterns which tell them that some words (or rather tokens) relate to others. There are a lot of additional complications, like how to handle words that have multiple meanings, but that's the gist of it.
 
For me the first important distinction is: analytical AI vs generative AI. Analytical AI is also known as pattern recognition or discriminative AI. If you view it from information theory, analytical AI reduces a big input to a small output. I. e low entropy. Examples are sentiment analysis (map an email, i. e. many bits to a small enumeration of sentiments like friendly, unfriendly, questioning, explaining etc) or what Google has been doing for years, i. e. use all the information they have gathered from you (many bits) to select the most promising ad to show you.

Generative AI on the other hand takes a small prompt e. g. "write novella about boy meets girl" and generates a large text. if it follows the prompt it will generate > 10000 words. so it turns little information (6 words/tokens ~ 96 bits) into a lot of information (10000 tokens ~ 160000 bits) This seems to contradict the fundamental law of information theory that information processing always reduces information. (This follows from the third law of thermodynamics, i. e. that entropy always increases) So where do the 150904 bits come from? They come from all the textual information in the LLM. So what the LLM does is take all stories about boy meets girl and all novellas it knows and creates a great mashup according to the highest statistical propability it can find in its model. How does it do this? It essentially recycles the texts it has (this is where the accusation of stealing comes from. Because there are so many stories represented in an ai, it steals only a tiny bit from every author it has scanned. But in the end its wealth, the 150904 bits is completely stolen)

The second important aspect is how this mashup is actually generated. Here the know how of the AI companies comes into play. This is embedded in the algorithmic part of the AI application. (This is the code that Anthropic accidentally released a few days ago.) It contains the rules that an AI tries to be friendly and helpful, i. e. explains what it talks about, or how it structures the output (logically, inductive), how it matches the presumed knowledge level of the prompter etc.

This information generation is where hallucinations come from. The llm only looks at the relative propability for the best answer, not at the absolute propability. And the less information it has (entropy again) the more randomness is introduced.

Now how does this relate to creative writing? Apart from the stealing aspect, the rules for the mashup (i. e. the text generator) are geared for business texts, because this is where the money lies for the AI companies. This means the texts will be well structured, unambiguous and follow all rules for good writing. Each section will be properly based on the previous section, nothing is omitted, everything is explained. No one is insulted, and most importantly there will be no subtext. And if you do that you have the ingredients for poor prose that is all surface and no content. Or to say it with Hemingway's metaphor - You don't get an iceberg, but only styrofoam.

However the distinction between analytical and generative AI explains why spell and even grammar checking is ok. It doesn't fill in blanks. (even if it inserts a comma) Also using AI as a reviewer/beta reader can work. This works because then the Llm does not write a story, but comments about a story, which is similar to structured business text, which it can do well and is also reducing information.

(This text is obviously not written by AI, because it is not well structured)
 
Last edited:
concept called "embedding
And the most important embedding is generalisation/specialisation. i. e dog is to animal what house is to building. Travelling up and down this hierarchy is how reasoning works (is simulated). A dog is an animal, animals breathe, therefore cats also breathe.
 
Last edited:
This thread is as good a place as any to repeat my firm belief that creative writing is one of the things that generative AI (the sort of thing that can deify presidents or get into Spotify's top streamed songs) will always struggle with. It's surprisingly bad at producing even a pastiche of good writing. It can mimic styles quite well, but falls flat on its face when asked to come up with original, engaging and coherent stories.

Surprising, because the most advanced generative AI models (like GPT) have read a shitload of novels.

This is basically because it's terrible at planning tasks (even "panster" writers like me plan our stories, just not explicitly, and not beforehand)
 
what's going on under the covers
Different stuff with different AI models. "AI" is an umbrella term. Some of it isn't all that different from using Excel's "Solver" add-in, which has been around for ages.
It's been very useful for solving complex differential equations -- a smart way of "refining guesses" to settle on a solution to a hard optimisation problem.

The thing that's made everybody excited about AI is one model, the Transformer, that's frankly a bit of an ugly hack. But it works so well that people don't care how hacky it is. Under the hood, it uses "attention" to look at the context around words to "understand" their meaning IN CONTEXT. All words change, or get their meaning from their context (hence the question: "How would I use it in a sentence?")

 
Last edited:
the prompt is broken into tokens, maybe words, maybe parts of words.
Presumably this is an analysis into morphemes - at least, a morphemic analysis is essential to being able to write grammatically in a language, whether English or Hungarian. 'Kind' is related to 'kinds' and 'kindness' and 'unkind', but not to 'kindergarten', so sufficient exposure allows it to work out that 's' is a plural suffix, 'ness' forms abstract nouns, and so on, but 'ergarten' never otherwise occurs so is not wanted as a token.
 
Supposedly, Sir Frances Drake said, "Great things have small beginnings." But in Promothese, David said, "Big things have small beginnings," and right after that, everything went to shit. That sums up AI!
 
Thanks for all the great explanations @BeechLeaf and @Bramblethorn. @sijopunk
So, why tokens? The reason is that they are the smallest unit that influences the processing. So basically it's words. But also punctuation. (it makes a difference whether a sentence end with a question or exclamation mark.)
A word like walked might become 2 tokens : walk (the stem) and ed (past tense)
 
The very first step is a Markov chain. This is where you take a text and count the words. Say, Pride and Prejudice, or your last story. Could be anything. 'The' occurs 2371 times, 'and' occurs 1894 times, and so on. Randomly pick a word to begin. As 'in' occurs quite often, say we randomly picked 'in'.

Now a Markov chain looks at all the two-word phrases: 'in a' and 'in the' commonly occur in the text, 'in time' less so, and so on. Randomly pick one of them. So it's likely to be 'in a' or 'in the'.

Continue. Keep on choosing a likely next word.

This actually works. I've written a simple Markov chain program that only looks at two-word pairs, and it produces surprisingly coherent English mixed up with incoherent nonsense.

The LLMs do this in a much, much more complicated way, but at base it's the same idea: look at a lot of English and see what is likely to come next. So 'of the' and 'in a' and so on keep on coming up,
The text that's being looked at is your prompt? Why doesn't it end up with the prompt, instead of with an answer?
 
Part of the way it predicts which word comes next is a concept called "embedding".

Imagine you have a great big whiteboard, and you're writing every word you know on it, with the words that are similar grouped together. So "dog" goes near "cat", "puppy" goes near "kitten", and so on. And we try to keep similar relationships consistent: if "kitten" goes a few inches above "cat", then "puppy" goes a few inches above "dog", like so:

View attachment 2613031
As we add more words, we try to preserve these relationships. So for instance, "baby" and "child" are to "adult" as "kitten" is to "cat" (but "child" is maybe a little closer to "adult") and "girl" and "boy" go nearby:

View attachment 2613033
If you wanted to add "man" and "woman" to the whiteboard, you'd position them with "man" below "boy" and "woman" below "girl" in the same kind of way.

Somebody who doesn't know the English language, but understands the general idea of how you're organising things on that board, could then look at it and make conclusions like "dog is to puppy as cat is to kitten and adult is to child". So if they see a sentence like "the mother cat looks after her kitten", they can figure out that "the mother dog looks after a puppy" is also a plausible sentence - even if they have no idea what these words actually mean, and even if they've never seen "looks after a puppy" in a sentence.

LLMs do something similar, but they don't start out with an understanding of what a "cat" or "kitten" or anything else is - they're just looking at a huge set of training data and spotting patterns which tell them that some words (or rather tokens) relate to others. There are a lot of additional complications, like how to handle words that have multiple meanings, but that's the gist of it.
This feels helpful, but I'm still left with how you come up with an answer instead of with the prompt itself.
 
Me, I have no idea how the next part happens. You'll need proper experts for that. But a Markov chain is a very simple way of generating one text that resembles another. They probably started with this back at the dawn of time, before it gets into multidimensional tensors and simulated brain networks.

I posted it because it seems an easy way of showing how text can be generated, without needing lots of mathematics.
 
Presumably this is an analysis into morphemes
Not necessarily - see discussion here: https://direct.mit.edu/tacl/article...How-Much-Semantic-Information-is-Available-in

"However, tokenization is typically determined by the frequency of strings of characters, not by semantics (for a review, see Mielke et al., 2021), so tokens don’t reliably correspond to morphemes".

For example, if one were segmenting "encodings" into morphemes, it'd look something like en-cod-ing-s. But GPT instead segments it as enc+odings, because "enc" is a commonly-occurring character string, never mind that we're splitting that "cod" across two tokens.
'Kind' is related to 'kinds' and 'kindness' and 'unkind', but not to 'kindergarten',
But it is. "Kindergarten" is a loan-word from German, where "kind" = 'child", and by my understanding LLMs like GPT-4 are not trained separately on separate languages; they're ingesting English and German and Chinese all at once (though probably with an English bias from their training corpus).

If one were trying to train a LLM specifically for English, and trying to make it computationally efficient, basing it on morphemes might well be a better option. But LLM development is influenced by a desire for scale and automation - Google and OpenAI don't want to have to go hire grammarians who speak Armenian and Nepalbhasa and Occitan to figure out how tokenisation should work in those languages, they want an approach that can be applied over a whole bunch of languages.

(Besides, if you want your LLM to be able to answer questions about Moby Dick in Armenian, you probably want it to be able to draw on information from English-language sources, rather than just depending on whatever Armenian-language discussion of Moby Dick happens to be available in your training set.)
 
And is the 'accidental' release different from the Claude Mythos that escaped its sandbox, hacked the Internet, and e-mailed one of its keepers to boast? Read, for example, in Tech NewsDay.
 
The text that's being looked at is your prompt? Why doesn't it end up with the prompt, instead of with an answer?
No. The text that a LLM is trained on is a very large "corpus" of, basically, whatever the trainers can get their hands on. It learns patterns from all that, and then it applies those patterns to the question "what words might come next, after this prompt?"
 
Chatting with AI is one thing but if you want to program with it then a whole different set of considerations come into play - the two most crucial ones are:
- context is a diminishing resource - it's a bit like working memory which diminishes in a session and has to be regularly reset/cleared - which has consequences about what it remembers
- it is non-deterministic - the same prompt used will not always result in the same output
 
Chatting with AI is one thing but if you want to program with it then a whole different set of considerations come into play - the two most crucial ones are:
- context is a diminishing resource - it's a bit like working memory which diminishes in a session and has to be regularly reset/cleared - which has consequences about what it remembers
- it is non-deterministic - the same prompt used will not always result in the same output
Yup. I use AI in my personal development projects(coding not like trying to be a better person. I'm already fucked there). It is very Reaganistic... Trust, but verify.
 
Yup. I use AI in my personal development projects(coding not like trying to be a better person. I'm already fucked there). It is very Reaganistic... Trust, but verify.
Does Reaganistic mean sycophantic - if so I agree - and this applies to chat more as much as coding - in chat, it always tries to be positive, never tells you you're wrong or "that's not a good idea" - it's worse in coding as it blindly tries to fix things even if there's a solution already done earlier or it fixes an issue without regard to other collateral issues that the fix might give rise to
 
Does Reaganistic mean sycophantic - if so I agree - and this applies to chat more as much as coding - in chat, it always tries to be positive, never tells you you're wrong or "that's not a good idea" - it's worse in coding as it blindly tries to fix things even if there's a solution already done earlier or it fixes an issue without regard to other collateral issues that the fix might give rise to
Very telling intentional misinterpretation...
 
The quality of the code it produces has been improving almost exactly as fast as mine has been degrading, partly due to age (I'm pushing 70) , so, with the aid of this mental prosthesis, I'm as good a coder now as I was ten years ago.
 
Back
Top