That's the philosophical question that everyone is struggling with; and I'll freely admit it's a difficult one to answer if we're supposed to be judging by the output and nothing else. But when we have more information than just the output, it's a much easier question to answer.
That's the point of John Searle's "Chinese Room". To wit:
Imagine a kind of "physical" ChatGPT that reads and outputs in Chinese. You write a prompt in Chinese on a slip of paper and insert it into a slot in a door, "something" - you have no idea what - happens on the other side of the door, and another slip of paper with a perfectly intelligible and sensible response to your prompt in Chinese text comes out of the slot. Assuming you know Chinese, you can say that based on what you as a user can observe, the room - machine, whatever it is - certainly appears to understand Chinese much the same way that ChatGPT appears to understand English.
But now imagine another door that behaves identically, except that instead of a machine running a program, it's me sitting on the other side if it. As a simple matter of fact I don't understand a lick of Chinese - can't read it, can't write it, can't speak it. I have a huuuge book of instructions (in English), and when a slip of paper comes through the door, I cross-reference the Chinese script with my instruction manual. The manual doesn't tell me what the characters mean; it just tells me that if you see these characters in this order, write these other characters on a new slip of paper and kick it back out through the door. I have no clue what the slip of paper that came in the door says, and I don't know what the slip of paper I'm sending back through it says either - and yet, the Chinese-speaking person outside the door received what looked to them like a coherent, human-like answer to whatever their question was. Can it be said that I meaningfully "understand" Chinese when this is what I'm doing?