• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Merged Artificial Intelligence

Be careful. I have had ChatGPT cite references that it just made up. When I complained to it, it apologized and just made up new references.
Yesterday I asked Copilot whether Taiwan is a country, and it cited the thread I was reading here, which was about the Princess Diana conspiracy.
 
It's a technical description. Would you prefer "confabulation"?
I don't think it's all that technical. But I do think it makes a useful distinction from other human forms of being wrong. It's not ignorance. It's not misunderstanding. It's not dishonesty. It's building a false picture of reality due to a failure of cognition.
 
I don't think it's all that technical. But I do think it makes a useful distinction from other human forms of being wrong. It's not ignorance. It's not misunderstanding. It's not dishonesty. It's building a false picture of reality due to a failure of cognition.
That's a pretty good definition of a hallucination you've just given there.
 
James O'Malley has a piece on why DeepSeek doesn't mean the end of ChatGPT and friends. A more efficient use of resources will mean the existing computational capacity can be used to do more.
And that's even assuming that they're telling the truth about the amount of computational capacity used. I've seen claims that they lied about how many chips they used, because they were using a bunch of NVidia chips that sanctions were supposed to prevent them from even having. I don't know if that's true, but it's plausible.
 
DeepSeek Isn't the only company doing what the Silicon Valley giants are doing, better and cheaper.
It's proof that there is not enough technology advantage to establish dominance in the Field, which is the only way these Tech Bros can think about competition.

If the US actually wanted to become the forerunner in A.i. instead of just shoveling taxpayer money to private companies, it would massively invest in educating students and workers in the technology. And it would go full Open Source - that's what made the Internet possible.
 
Here's an interesting explanation of the techniques behind DeepSeek:

This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.
 
This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.
I'll try to briefly explain what is covered in the video.

One is a technique called "Mixture of Experts".

The Mixture-of-Experts (MoE) 101​

The Mixture of Experts (MoE) model is a class of transformer models. MoEs, unlike traditional dense models, utilize a “sparse” approach where only a subset of the model’s components (the “experts”) are used for each input. This setup allows for more efficient pretraining and faster inference while managing a larger model size.

In MoEs, each expert is a neural network, typically a feed-forward network (FFN), and a gate network or router determines which tokens are sent to which expert. The experts specialize in different aspects of the input data, enabling the model to handle a wider range of tasks more efficiently.

The other technique they cover is something called "chain of thought", which is sort of like how human beings approach complicated questions.
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking.
Combining these techniques allows the LLM to achieve similar performance, but requires fewer GPUs to run, resulting in greater efficiency. E.g., less electricity and less expensive hardware.

The video description also has links to technical papers.
 
I'll try to briefly explain what is covered in the video.

"We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking."
Thanks for that. I already used up my YouTube allocation watching cat videos, so...

The 'chain of thought' idea is very interesting. This is a much better approach than just throwing more computing power at it.
 
James O'Malley has a piece on why DeepSeek doesn't mean the end of ChatGPT and friends. A more efficient use of resources will mean the existing computational capacity can be used to do more.
But that means they won't have to buy exponentially more hardware to do the job, which means they won't need exponentially more investment, which means the stock price won't keep going up exponentially. This destroys the main purpose of AI, 'making' money!
 
But that means they won't have to buy exponentially more hardware to do the job, which means they won't need exponentially more investment, which means the stock price won't keep going up exponentially. This destroys the main purpose of AI, 'making' money!
They will make enough money by replacing office jobs, don't worry ..
 
This is a video from a channel that produces very high-quality videos. If you are interested in the topic this is a must-watch video. Otherwise ignore it. It is very difficult to give a good summary in a few words as there is a lot of information in 20 minutes.
Thank you.

It appears that the video might go into a level of detail that I am unlikely to be capable of following, so I'll give it a miss.
 
And that's even assuming that they're telling the truth about the amount of computational capacity used. I've seen claims that they lied about how many chips they used, because they were using a bunch of NVidia chips that sanctions were supposed to prevent them from even having. I don't know if that's true, but it's plausible.
So here's a bit of circumstantial evidence to support the claim that DeepSeek was using more hardware than claimed.


NVidia chip sales to Singapore soared after DeepSeek was founded, even though Singapore itself isn't really an AI hub.
 
We know they are telling the truth about the computational resources it requires. It is open source and people have already got it running.
 
We know they are telling the truth about the computational resources it requires. It is open source and people have already got it running.
Do they have it running, or do they have it training? Because training takes a ◊◊◊◊ ton more resources than running once trained.
 

Back
Top Bottom