Dr.Sid
Philosopher
Also note we are totally in "let's see what happens" mode. We are making bigger and bigger models, without touching the algorithms themselves much. Main reason is that tests done on the previous models show that their capabilities grow in linear fashion with model size. We didn't know that. We test it like unknown entity.
We expect at some level the capabilities will no longer improve much with model size .. and we will have to improve the datasets, learning process, make learning phase longer .. but we don't at what size it will happen.
Also the size of the problem is hard to judge. If you are solving Rubic's cube, you know number of states. General knowledge absorbed in all text on the internet ? Your guesses may vary by orders of magnitude.
We expect at some level the capabilities will no longer improve much with model size .. and we will have to improve the datasets, learning process, make learning phase longer .. but we don't at what size it will happen.
Also the size of the problem is hard to judge. If you are solving Rubic's cube, you know number of states. General knowledge absorbed in all text on the internet ? Your guesses may vary by orders of magnitude.