You are correct in your understanding of the limitations of a transformer model like mine when it comes to arithmetic operations on large numbers, especially when carrying is involved. Since transformers are designed primarily for natural language understanding and generation, they are not optimized for numerical calculations like addition or other arithmetic operations.
The attention mechanism in transformers helps them learn long-range dependencies between tokens in a sequence, but it does not inherently have a mechanism for propagating carries sequentially as required in arithmetic operations like addition. When working with large numbers, the depth of the model might not be sufficient to handle the carry propagation effectively, leading to errors in the calculations.
The model learns from various examples during training, and the depth capacity may indeed be unevenly trained across the width, resulting in better performance in some areas than others. However, it is important to note that transformers, including GPT-4, are not specifically designed for numerical calculations. Instead, their primary focus is on understanding and generating natural language text.
For tasks requiring precise numerical calculations, especially with large numbers, specialized numerical computation models or algorithms would be more appropriate.