The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,262 other subscribers

Archive for March 21st, 2024

Autumn 2023 research: How Is ChatGPT’s Behavior Changing over Time?

Posted by jpluimers on 2024/03/21

[Wayback/Archive] https://arxiv.org/pdf/2307.09009.pdf ([Google Docs PDF view: Wayback] Google Docs PDF view: 2307.09009.pdf) is interesting. The abstract confirms my thought: over time LLM drift over time and seem to become worse at knowledge tasks.

How Is ChatGPT’s Behavior Changing over Time?

Lingjiao Chen†, Matei Zaharia‡, James Zou†
†Stanford University ‡UC Berkeley

Abstract

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services.
However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4’s amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task. GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March. GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5’s performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. We provide evidence that GPT-4’s ability to follow user instructions has decreased over time, which is one common factor behind the many behavior drifts. Overall, our findings show that the behavior of the “same” LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.

Later on, Eric Topol had the very interesting conversation with James Zou below which covers many AI aspects including a lot of LLM ones. Basic takeaways for me are that they are good at repeating things from their training data, making them OK on generating text, sort of OK for grammar, but far form OK from reproducing knowledge, and that it will become harder over time to distinguish LLM generated content from human created content.

The video of the conversation is below the blog signature; here is the link: [Wayback/Archive] James Zou: one of the most prolific and creative A.I. researchers in both life science and medicine – YouTube

Almost all LLMs are being trained on a corpus without curation (curation is way too expensive), resulting in them at best averaging the corpus (as in the foundation, LLM is just a “monkey see, monkey do” on steroids but without the means of self-curating to result in above average generation. I think that given more and more on-line content is being and becoming generated by LLM, and newer LLM will be trained based on the corpus encompassing that content (without the means to filter out LLM generated content), over time LLM will perform worse instead of better.

Via he below series of interesting tweets of which were quoted by a slightly less pessimistic Erik Meijer [Wayback/Archive] Erik Meijer on X: “Regression to the mean.. Nnote some interesting replies as well. I found the one mentioning Eternal September especially fitting. It made me discover [Wayback/Archive] www.eternal-september.org

Today is September 11160, 1993, the september that never ends
No pr0n, no warez, just Usenet

Anyway, the tweets:

Read the rest of this entry »

Posted in AI and ML; Artificial Intelligence & Machine Learning, Awareness, ChatGPT, Development, GPT-3, GPT-4, LLM, Software Development | Leave a Comment »

My Ultimate PowerShell prompt with Oh My Posh and the Windows Terminal – Scott Hanselman’s Blog

Posted by jpluimers on 2024/03/21

Via [Archive.is] Kevin on Twitter: “Gotta say this looks amazing and I actually didn’t know you can customize the command line on Windows this far. Read this blogpost by @shanselman , highly recommended. 👇 “

For my link archive: [Wayback] My Ultimate PowerShell prompt with Oh My Posh and the Windows Terminal – Scott Hanselman’s Blog

Read the rest of this entry »

Posted in CommandLine, Development, Power User, PowerShell, PowerShell, Scripting, Software Development, Windows, Windows 10, Windows Development | Leave a Comment »

GPS jamming & interference map | Flightradar24

Posted by jpluimers on 2024/03/21

Not so relevant in our area yet, but all the more relevant in some areas: [Wayback/Archive] GPS jamming & interference map | Flightradar24

Via:

--jeroen

Posted in Awareness | Leave a Comment »