Merged Artificial Intelligence

Puppycow · Feb 21, 2025

Orphia Nay said:
But are they sure the "answer" is right?

They know that it's the same answer that the scientists arrived at after 10 years of research.

To be more specific:

The full decade spent by the scientists also includes the time it took to prove the research, which itself was multiple years.

But they say, had they had the hypothesis at the start of the project, it would have saved years of work.

Prof Penadés' said the tool had in fact done more than successfully replicating his research.

"It's not just that the top hypothesis they provide was the right one," he said.

"It's that they provide another four, and all of them made sense.

"And for one of them, we never thought about it, and we're now working on that."

So it also generated an additional hypothesis that they hadn't thought of.

To be sure, I think all it did was to generate hypotheses, and that's only the first step. Actually confirming or falsifying those would take time and experiments to prove.

Darat · Feb 21, 2025

Puppycow said:
They know that it's the same answer that the scientists arrived at after 10 years of research.

To be more specific:

So it also generated an additional hypothesis that they hadn't thought of.

To be sure, I think all it did was to generate hypotheses, and that's only the first step. Actually confirming or falsifying those would take time and experiments to prove.

Yeah this isn't about cutting out research it's about where to start researching. One of the hardest things to come up with in the likes of biology is the "right" hypothesis, if these AIs can whittle down the "hypothesis space" then that will speed up human led research. Much of the research in many fields of science ends up "wasted" in a sense when the hypothesis is proven false. Of course it's not truly wasted as negative results can help to reduce the hypothesis space, but that is expensive in terms of time and costs, having a tool that perhaps can cut the hypothesis space by say 50% will be immensely valuable to humanity.

Darat · Feb 21, 2025

Lots more information here: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

I recall A E van Vogt (yes the originator of much of Dianetics) was predicting the need for the generalist or synergist in science in 1950, the argument being that scientists have become too specialised as the knowledge in each branch increased and we need someone who can look across all of science and integrate knowledge from all branches of science. Looks like AI is going to be that synergist, really hope someone creates one and calls it "Nexial".

Wudang · Feb 24, 2025

Article: https://futurism.com/openai-researchers-coding-fail
Original paper : https://arxiv.org/pdf/2502.12115

Using a new methodology to evaluate AI performance against a series of tasks from Upwork, even the best AIs perform fairly poorly. The original paper is well worth a read

Darat · Feb 24, 2025

That is interesting. But I wouldn’t trust the AI companies not to cheat! They do say:

Therefore, when running this eval, it is important to either disable browsing or limit sites accessed to prevent direct lookups, and to ideally also post-hoc filter for any instances of cheating. This will help ensure evaluation results measure real capability rather than contamination.

Wudang · Feb 24, 2025

Yeah I read that as trying to stop copy/pasta coding via stack overflow etc.
They refer to low level software engineers but the old-school term I'd use is "coding clerk" - someone who can read a detailed spec and translate it to code without any real understanding of what they're doing. Met plenty of them (and many good techs) at a certain bank.

Brainster · Mar 7, 2025

I don't know if the AI is trained to look at questions and see if they sound like homework questions, and then to give an answer that is close but with enough computational errors that if the students were smart enough to check the computations and make the necessary adjustments, they'd eventually arrive at the right answer. Here's one I had today with Google's AI:

We are evaluating a project that costs $841,753, has an eight-year life, and has no salvage value. Assume that depreciation is straight-line to zero over the life of the project. Sales are projected at 60,244 units per year. Price per unit is $37, variable cost per unit is $18, and fixed costs are $424,404 per year. The tax rate is 35%, and we require a return of 21% on this project.
In dollar terms, what is the sensitivity of NPV to changes in the units sold projection?

One of the first things to do is calc the EBIT, which the AI gives this formula:
(37-18)*60,244-424,404-105,219.125=615,252.875.
The left side of that equation is correct (and there is no rounding in there), but the right side actually comes out to 615,012.875, or exactly $240 off. At most of the steps in the calc the AI makes a computational error like that--not way off, but off enough that at the end they come up with $48.68 where the correct answer is $46.01.

Blue Mountain · Mar 7, 2025

Brainster said:
I don't know if the AI is trained to look at questions and see if they sound like homework questions, and then to give an answer that is close but with enough computational errors that if the students were smart enough to check the computations and make the necessary adjustments, they'd eventually arrive at the right answer.

That's an interesting thought. I like chatting with a different AI/LLM chatbot, and it constantly makes mistakes with even simple calculations.

Puppycow · Mar 7, 2025

Brainster said:
The left side of that equation is correct (and there is no rounding in there), but the right side actually comes out to 615,012.875, or exactly $240 off. At most of the steps in the calc the AI makes a computational error like that--not way off, but off enough that at the end they come up with $48.68 where the correct answer is $46.01.

Yeah, I think this is common for LLMs. Math isn't usually their strong suit. But we have calculators to do that, or software specifically designed to solve math problems. WolframAlpha might be better suited to solving problems that involve math.

Dr.Sid · Mar 7, 2025

Well LLM is a stupid guy who read a lot of books. So many he doesn't even remember them correctly. It has its uses. Math is not one of them.

Darat · Mar 7, 2025

Brainster said:
I don't know if the AI is trained to look at questions and see if they sound like homework questions, and then to give an answer that is close but with enough computational errors that if the students were smart enough to check the computations and make the necessary adjustments, they'd eventually arrive at the right answer. Here's one I had today with Google's AI:

One of the first things to do is calc the EBIT, which the AI gives this formula:
(37-18)*60,244-424,404-105,219.125=615,252.875.
The left side of that equation is correct (and there is no rounding in there), but the right side actually comes out to 615,012.875, or exactly $240 off. At most of the steps in the calc the AI makes a computational error like that--not way off, but off enough that at the end they come up with $48.68 where the correct answer is $46.01.

Co-Pilot’s “think deeper” seems to get the right answer - don’t know if the “reasoning” gets it right - this is the full answer:

Darat · Mar 7, 2025

I really would like to see these AI benchmarks calibrated to humans, I know they do things like”graduate level questions” but not yet seen any doing the necessary research into how well humans do at a particular benchmark. I know it would be expensive, you’d need to create different pools of folk and they’d need to be large enough, but it probably wouldn’t cost an hour of the electricity bill for the data centres. If I was looking into implementing AI systems in my business to replace actual people I’d want to know how the AI systems stack up against actual humans.

Brainster · Mar 7, 2025

Darat said:
Co-Pilot’s “think deeper” seems to get the right answer - don’t know if the “reasoning” gets it right - this is the full answer:

Let's break down the process step by step. We want to know how much the Net Present Value (NPV) will change if the annual units sold (let’s call it _Q_) change by 1 unit. In other words, we need to compute the derivative (slope) dNPV/dQ in dollars per unit.

### 1. Determine Annual Operating Cash Flow (OCF)

Each year, the project generates revenues, and incurs costs as follows:

- **Revenue per year:**
$ \text{Revenue} = \text{Price/Unit} \times Q = 37Q $

- **Variable Cost per year:**
$ \text{Variable Costs} = 18Q $

- **Contribution Margin per year (before fixed costs):**
$ \text{Contribution} = (37 - 18)Q = 19Q $

- **Fixed Costs per year:**
$ \text{Fixed Costs} = 424,\!404 $

- **Depreciation per year:**
The project cost is $ \$841,\!753 $ with an 8‐year life and no salvage value. Under straight-line depreciation,
\[
\text{Depreciation} = \frac{841,\!753}{8} \approx 105,\!219.125
\]

So, the Earnings Before Interest and Taxes (EBIT) is:

\[
\text{EBIT} = \text{Contribution} - \text{Fixed Costs} - \text{Depreciation} = 19Q - 424,\!404 - 105,\!219.125
\]
\[
\text{EBIT} = 19Q - 529,\!623.125
\]

Since taxes are applied at 35%, the after-tax EBIT (or Net Operating Profit After Taxes, NOPAT) is:

\[
\text{NOPAT} = \text{EBIT} \times (1-0.35) = (19Q - 529,\!623.125)(0.65)
\]

Now, because depreciation is a non‐cash expense, the annual operating cash flow (OCF) becomes:

\[
\text{OCF} = \text{NOPAT} + \text{Depreciation} = 0.65\,(19Q - 529,\!623.125) + 105,\!219.125
\]

Let’s simplify the term involving _Q_:

\[
0.65 \times 19Q = 12.35Q
\]

And the constant term is:

\[
-0.65 \times 529,\!623.125 + 105,\!219.125 \approx -344,\!255.0 + 105,\!219.125 \approx -239,\!035.875
\]

Thus, the OCF can be written as:

\[
\text{OCF} \approx 12.35Q - 239,\!036
\]

### 2. Find the Incremental Change in Annual OCF per Additional Unit Sold

Notice that the only term involving $ Q $ is $ 12.35Q $. This means that for each extra unit sold per year, the operating cash flow increases by approximately \$12.35.

### 3. Compute the Effect on NPV

NPV is calculated as the sum of the discounted cash flows over 8 years minus the initial investment:

\[
NPV = -841,\!753 + \sum_{t=1}^{8} \frac{\text{OCF}}{(1+0.21)^t}
\]

Since the OCF is linear in $ Q $, the sensitivity of NPV to an extra unit sold (i.e. the derivative $ \frac{dNPV}{dQ} $) is the increase in annual cash flow per unit multiplied by the present value annuity factor (PVAF) at a discount rate of 21% for 8 years.

The annuity factor is given by:

\[
\text{PVAF} = \frac{1 - \left(\frac{1}{1.21}\right)^8}{0.21}
\]

Let’s calculate it:

1. Compute $\left(\frac{1}{1.21}\right)^8$.
$1.21^8$ is approximately $4.586$, so:
\[
\left(\frac{1}{1.21}\right)^8 \approx \frac{1}{4.586} \approx 0.218
\]

2. Now the annuity factor:

\[
\text{PVAF} \approx \frac{1 - 0.218}{0.21} = \frac{0.782}{0.21} \approx 3.7238
\]

Finally, the sensitivity is:

\[
\frac{dNPV}{dQ} = 12.35 \times 3.7238 \approx 45.99
\]

### 4. Conclusion

**The sensitivity of the project’s NPV to changes in the units sold projection is approximately \$46 per additional unit sold.**

---

This means that if your annual units sold projection increases (or decreases) by 1 unit, the NPV increases (or decreases) by about \$46 in today’s dollars.

Is there another angle of the project’s risk or perhaps a further exploration into how changes in other parameters (like price or variable costs) might impact NPV that you’d like to dive into?

Generally correct although the rounding results in a slight error; the annuity factor works out to be 3.72558 instead of 3.7238 which is why they end up a penny or two off.

arthwollipot · Mar 7, 2025

You'd think they'd be able to give a LLM access to a simple calculator app. Those don't make mistakes.

Dr.Sid · Mar 7, 2025

arthwollipot said:
You'd think they'd be able to give a LLM access to a simple calculator app. Those don't make mistakes.

Clearly it's not so simple. LLMs augmented by web search for example .. but that's done before prompting the LLM. Web search is done, and the text is added to your prompt (or even in front of it). LLM then processes your question based on the text it got from the web (and its own knowledge base). The thing is LLM does not decide to use the tool.
But I don't see why it wouldn't be possible. Chain of thought models are taught to first output the thought process, and enclose it in <think> tags. So I don't see why it couldn't be taught to output something <calc>1+1</calc> .. which then could be parsed, old-school calculator called to get the result. Chain of thought could actually be pretty good for this, as it could plan the use of external tool ahead, and then work on the results.
But I haven't heard about something like this being commonly used. I heard about some system which could do programming all by it self. Choose tools, write code, run the compiler, react on errors, fix them, run the code, capture output, detect errors, fix the code .. with no intervention.
But I guess it still has issues, or it's too good to be given for free ..

dudalb · Mar 7, 2025

Well, one experiment with AI that did not go so well
:https://www.msn.com/en-us/news/us/t...til-hours-later/ar-AA1Aka2I?ocid=BingNewsSerp

Skeptical Greg · Mar 8, 2025

Sounds like the AI did a good job of what it was supposed to do.
What's the problem?

theprestige · Mar 8, 2025

The problem is that some people don't want AI to reflect humanity, they want it to shape humanity.

The Great Zaganza · Mar 8, 2025

theprestige said:
The problem is that some people don't want AI to reflect humanity, they want it to shape humanity.

of course.
You don't build a tool to do exactly what you can do on your own.
That's the whole point.

Darat · Mar 8, 2025

dudalb said:
Well, one experiment with AI that did not go so well
:https://www.msn.com/en-us/news/us/t...til-hours-later/ar-AA1Aka2I?ocid=BingNewsSerp

Seems like a few teething problems and it emphasised the weakness of journalism that thinks journalists and all stories have to show balance "both-sidesism". The AI did exactly what it was told to do. This wasn't AI failing, but human error.

Merged Artificial Intelligence

Penultimate Amazing

Lackey

Lackey

BOFH

Lackey

BOFH

Penultimate Amazing

Resident Skeptical Hobbit

Penultimate Amazing

Philosopher

Lackey

Lackey

Penultimate Amazing

Observer of Phenomena, Pronouns: he/him

Philosopher

Penultimate Amazing

Agave Wine Connoisseur

Penultimate Amazing

Maledictorian

Lackey