It doesn't. The sound is fake. It has been added. The fireman in question says "I know that, but you gotta get back, the city's exploded."
The people present talk over the explosion, as if it had never happened. In addition, the explosion is heard most clearly in the low frequency range and upper range. The volume levels for the explosion are well below all other sound in the area - the voices of the people peak frequently and yet the only peaking on the explosion is below 60 Hz.
The lower end could be explained by a distant explosion, however the high-volume upper end indicates close proximity to the explosion. Consumer handycams have very poor performance at the lower end of the audio range, and professional cameras are equipped with shotgun microphones designed specifically to pick up human voices.
There is no echo on the explosion - and yet it supposedly occured in a crowded city - which produces multiple echos.
-Gumboot