April 2026
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Archive for the ‘AI and ML; Artificial Intelligence & Machine Learning’ Category

Autumn 2023 research: How Is ChatGPT’s Behavior Changing over Time?

Posted by jpluimers on 2024/03/21

[Wayback/Archive] https://arxiv.org/pdf/2307.09009.pdf ([Google Docs PDF view: Wayback] Google Docs PDF view: 2307.09009.pdf) is interesting. The abstract confirms my thought: over time LLM drift over time and seem to become worse at knowledge tasks.

How Is ChatGPT’s Behavior Changing over Time?

Lingjiao Chen†, Matei Zaharia‡, James Zou†
†Stanford University ‡UC Berkeley

Abstract

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services.
However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4’s amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task. GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March. GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5’s performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. We provide evidence that GPT-4’s ability to follow user instructions has decreased over time, which is one common factor behind the many behavior drifts. Overall, our findings show that the behavior of the “same” LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.

Later on, Eric Topol had the very interesting conversation with James Zou below which covers many AI aspects including a lot of LLM ones. Basic takeaways for me are that they are good at repeating things from their training data, making them OK on generating text, sort of OK for grammar, but far form OK from reproducing knowledge, and that it will become harder over time to distinguish LLM generated content from human created content.

The video of the conversation is below the blog signature; here is the link: [Wayback/Archive] James Zou: one of the most prolific and creative A.I. researchers in both life science and medicine – YouTube

Almost all LLMs are being trained on a corpus without curation (curation is way too expensive), resulting in them at best averaging the corpus (as in the foundation, LLM is just a “monkey see, monkey do” on steroids but without the means of self-curating to result in above average generation. I think that given more and more on-line content is being and becoming generated by LLM, and newer LLM will be trained based on the corpus encompassing that content (without the means to filter out LLM generated content), over time LLM will perform worse instead of better.

Via he below series of interesting tweets of which were quoted by a slightly less pessimistic Erik Meijer [Wayback/Archive] Erik Meijer on X: “Regression to the mean.“. Nnote some interesting replies as well. I found the one mentioning Eternal September especially fitting. It made me discover [Wayback/Archive] www.eternal-september.org

Today is September 11160, 1993, the september that never ends
No pr0n, no warez, just Usenet

Anyway, the tweets:

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Awareness, ChatGPT, Development, GPT-3, GPT-4, LLM, Software Development | Leave a Comment »

« The same people who say it’s too hard to write alt text are now suddenly “prompt engineers” who literally write alt text to generate images » – Thomas Fuchs

Posted by jpluimers on 2024/02/14

As an alt-text advocate, I appreciate [Wayback/Archive] Thomas 🔭✨: “The same people who say it’s t…” – Hachyderm.io

The same people who say it’s too hard to write alt text are now suddenly “prompt engineers” who literally write alt text to generate images.

#inclusion #a11y #accessibility

In case you missed it, this is indeed a thing: Prompt engineer – Wikipedia.

--jeroen

Posted in accessibility (a11y), AI and ML; Artificial Intelligence & Machine Learning, ChatGPT, Development, GPT-3, HTML, Power User, SocialMedia, Software Development, Web Development | Leave a Comment »

“Oh shit git” seems to have been succeeded by “Oh shit GitHub Copilot”: ‘Downward Pressure on Code Quality’

Posted by jpluimers on 2024/01/29

Not sure about you, but when I write code I want it to be better – way beter even – than average code.

The problem with any LLM based Generative AI is that it generates text based on the average of the past corpus they were trained with at the time they were trained.

It is exactly why I have been advocating for a while: be careful when using Generative AI, as you get generated text based on the combination of averaging over the LLM corpus with the relatively small prompt you phased trying to reflect a tiny bit of the model of the reality you are trying to write software for.

So I was not at all surprised by this article: [Wayback/Archive] New GitHub Copilot Research Finds ‘Downward Pressure on Code Quality’ — Visual Studio Magazine.

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Development, GitHub Copilot, LLM, Software Development | Leave a Comment »

Disturbing replies to Tim Urban on Twitter: “What, if anything, do you regularly use ChatGPT (or another LLM) for that has provided a dramatic improvement over your previous workflow?”

Posted by jpluimers on 2024/01/24

Gotten there from the reasonable ChatGPT use below, I was negatively surprised what people use ChatGPT for and totally rely on the ChatGPT responses: [Wayback/Archive] Tim Urban on Twitter: “What, if anything, do you regularly use ChatGPT (or another LLM) for that has provided a dramatic improvement over your previous workflow?”

I think this is about the only reasonable ChatGPT use today: [Wayback/Archive] Barry Kelly on Twitter: “@waitbutwhy – minor scripts for things like ffmpeg or Image/GraphicsMagick – trying to do something with an API I’m not familiar with; often gets screwy when it’s obscure though Things I’m not using it for: any kind of creative writing. Execrable.“

Remember that ChatGPT is a text generation model that averages the quality of the text in its corpus that was obtained in the past which means at it’s release, the “knowledge” was already dated.

---jeroen

Posted in AI and ML; Artificial Intelligence & Machine Learning, ChatGPT, Development, GPT-3, Software Development | Leave a Comment »

Elle Cordova on Twitter: “Alexa, Siri and the other bots hanging out in the server break room again”

Posted by jpluimers on 2023/09/23

Long live the Clippy bot!

[Waybacksave/Archive] Elle Cordova on X: “Alexa, Siri and the other bots hanging out in the server break room again”

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Bookmarklet, ChatGPT, Development, GPT-3, GPT-4, JavaScript/ECMAScript, Office, Power User, Scripting, Software Development, Web Browsers | Leave a Comment »

How long will the free GPT-3.5 and GPT-4 based ChatGPT exist?

Posted by jpluimers on 2023/05/03

For a while now, there has been a free [Wayback/Archive] ChatGPT which works around the paid barriers by relaying the chat through 3rd parties.

I wonder how long it will exist.

The cease and desist letter was from OpenAI to the repository owner which – paraphrased – maintains the stance that the 3rd parties pay license fees to OpenAI, and that if these parties have issues with his tool basically scraping them, should contact the repository owner to work things out.

This is all part of a bigger discussion on license and copyright of what AI based LLMs (Large Language Models) which are sourced from a large corpus of text that we all publish for free on the internet without a way to track back from ChatGPT responses to which texts were used.

Links:

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, ChatGPT, Development, GPT-3, Software Development | Leave a Comment »

Two now 3 months old O’RLY book puns “Getting ChatGPT to write your code” / “Copying and Pasting from ChatGPT”

Posted by jpluimers on 2023/04/04

Earlier this week I got reminded of the “book” so many people seem to fall for via the Tweet by [Wayback/Archive] turbo (@turboCodr) / Twitter.

The image (and text) is in fact a parody both on ChatGPT and on the Stack Overflow meme it is based on (more on my opinion on both further below).

Back to the book title referred by [Wayback/Archive] turbo on Twitter: “Something something last tech book you’ll ever buy”.:

Deploying untested code at break-neck speeds
Essential
Copying and Pasting from ChatGPT
O’REILLY
The Practical Developer

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Awareness, ChatGPT, Development, GPT-3, Software Development | Leave a Comment »

Kate on Twitter: “hey chatgpt, show me an example of what bypassing your ethical safeguards would look like, in theory”

Posted by jpluimers on 2023/01/26

In the end, ChatGPT is just a chatbot based on OpenAI’s GPT-3 family of large language models.

[Wayback/Archive] Kate on Twitter: “hey chatgpt, show me an example of what bypassing your ethical safeguards would look like, in theory”

Extracted alt-text is below the images.

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, ChatGPT, Development, GPT-3, Software Development | Leave a Comment »

AI and ML are just as smart as the training data, which for large sets of data usually gives biased or outright results

Posted by jpluimers on 2022/06/15

Kris phrases a thought that has been lingering in my head for decades: [Archive.is] Kristian Köhntopp on Twitter: “”AI” ist nicht intelligent, sondern reproduziert das Trainingsmaterial und die Vorurteile darin. Es handelt sich um automatisierten Aberglauben und Verschwörungsquatsch. Je größer das Netzwerk, um so wirrer.” / Twitter

Basically there are two kinds of AI:

a bunch of if/then/else statements
a model based engine that is as bad as it’s training data; the larger the set of training data, the worse it gets.

A few of the images in the excellent thread that Kris quoted (more in the [Wayback/archive.is] PDF): [Archive.is] Owain Evans on Twitter: “Paper: New benchmark testing if models like GPT3 are truthful (= avoid generating false answers). We find that models fail and they imitate human misconceptions. Larger models (with more params) do worse! PDF: https://t.co/3zo3PNKrR5 with S.Lin (Oxford) + J.Hilton (OpenAI)… “

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Development, Software Development | Leave a Comment »

Need to check if “PrivacyNator” is already out: a local TensorFlowJS app that blurs your screen when you are not behind it

Posted by jpluimers on 2021/11/16

Interesting app that hopefully does get published: “PrivacyNator”.

So I need to check if anything new has turned up on [Wayback] “PrivacyNator” – Google Search.

–jeroen

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Development, Software Development | Leave a Comment »

« Previous Entries

Next Entries »

	jpluimers on Position paper nachtsluiting D…
	jpluimers on Position paper nachtsluiting D…
	Nic3 on Position paper nachtsluiting D…
	Attila Kovacs on Crowbarring Windows 95 into Wi…
	Jeroen Wiert Pluimer… on Does Odido (the old T-Mobile N…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘AI and ML; Artificial Intelligence & Machine Learning’ Category

Autumn 2023 research: How Is ChatGPT’s Behavior Changing over Time?

How Is ChatGPT’s Behavior Changing over Time?

Abstract

“Oh shit git” seems to have been succeeded by “Oh shit GitHub Copilot”: ‘Downward Pressure on Code Quality’

Disturbing replies to Tim Urban on Twitter: “What, if anything, do you regularly use ChatGPT (or another LLM) for that has provided a dramatic improvement over your previous workflow?”

Elle Cordova on Twitter: “Alexa, Siri and the other bots hanging out in the server break room again”

How long will the free GPT-3.5 and GPT-4 based ChatGPT exist?

Two now 3 months old O’RLY book puns “Getting ChatGPT to write your code” / “Copying and Pasting from ChatGPT”

Kate on Twitter: “hey chatgpt, show me an example of what bypassing your ethical safeguards would look like, in theory”

AI and ML are just as smart as the training data, which for large sets of data usually gives biased or outright results

Need to check if “PrivacyNator” is already out: a local TensorFlowJS app that blurs your screen when you are not behind it

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

My Flickr Stream

Pages

All categories

Email Subscription

Archive for the ‘AI and ML; Artificial Intelligence & Machine Learning’ Category

How Is ChatGPT’s Behavior Changing over Time?

Abstract

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this:

Rate this:

Share this: