Your posting states that the function taking constant values is a trivial case and I agree.
But it would be nice if the proof in the OP was addressed.
When I read this thread, I think what's needed is to take a step back and explain the ideas behind the whole stuff. It's 20 years since I've looked at either probability theory or measure theory, so please correct me if I goof up on some of the stuff.
In probability theory, the basic notion is a stochast X which "records" your events. If your event is something like the throw of a dice, you can speak of the chance you throw a 5: P[X = 5], or another value. However, when your event can have an arbitrary real value - say, you take someone's length - then it's senseless to speak of P[X = 1.83] as that chance is 0. Then you have a density function f(x) to capture the relative chance of getting that value - relative to all other values - and to really calculate the chance for some set of values to be hit, you take the integral. The usual distribution function taken is its primitive, so
[latex]$$F(x) = \int_{-\infty}^{x} f(t) dt$$[/latex]
which gives the chance P[X < x] that your event records a smaller value than x. So the issue of being able to work with a stochast is that the density function is integrable. And given that it is integrable, you may then next pose the question for which sets S you want to know the chance that your stochast "hits" the set:
[latex]$$P[X \in S] = \int_{t \in S} f(t) dt$$[/latex]
That's the kind of look at integrals you're probably not used to from high school mathematics, and that's where measure theory comes in. Measure theory is about setting up a system of subsets -- called "measurable sets -- of a given set for which it makes sense to define integrals. So the above formula makes sense if S is a measurable set. Think in the first place for S of sets like intervals [1.5, 3.14] -- that's where measure theory on real numbers starts with -- but also of (arbitrary) unions and intersections of those.
In particular, measure theory assigns to every measurable set a "size" of the set (just integrate the constant function 1 over the set).
When you look at the definition of random variable, you see it is in effect already a function from a probability space S1 to a measurable space S2. A measurable space is a set that is endowed with a system of measurable subsets; a probability space is a measurable space where the whole set has size 1: which is logical as the probability of any result at all is per definition 1.
Think of the space S1 of the possible events ("taking someone's length"), and S2 as their translation into some number; when you nitpick, they're two different spaces. The translation has to be a measurable function which means that the function behaves nice w.r.t. to the measure systems on S1 and S2, so the question P[X \in T], where T is a measurable subset of S2, can be calculated. The word "random" is maybe misleading; you could already assign to a random variable always the value 3, whatever the outcome of the event.
The theorem says that "taking a measurable function g of a random variable is again a random variable". If you look at the definitions, that just means that you compose the transformation function from the random variable with another (measurable) function, you again get a measurable function mapping from the original event space S1 now to a new space S3, and again you can calculate P[g(x) \in T] for any measurable subset T of S3.
So, in short, random variables are not really about randomness but more about being able to ask "what's the chance the event has a value in some reasonable set?", and measurable functions are about behaving nicely w.r.t. those reasonable sets.