Nuvia: don’t hold your breath

Yoused · May 6, 2024

The fastest way for a person to divide is on a slide rule. Compared to using paper, it is instantaneous. There is probably an effective way to build an analog divider into a CPU core that could divide as fast as a multiplier (would most likely be set up to do both), in situations where it would be practical to trade some digital precision for speed. Perhaps an electronic slide rule analog unit could be set up to provide a result refined to meet the invoker's need within the range of the timeframe.

mr_roboto · May 6, 2024

Unfortunately, any potential advantages of an analog divider would get destroyed by the overheads (floorplan area, time, and power) of converting digital inputs to analog values, then converting the analog result back into digital. DACs and ADCs are large and slow.

dada_dave · Monday at 7:42 AM

As posted by Xiao_Xi at Macrumors:

https://videocardz.com/newz/snapdragon-x-series-chips-cost-only-half-as-much-as-intel-raptor-lakes-with-battery-life-up-to-98-higher

A very interesting leak from Dell on how much cheaper with better life the new Qualcomm processors will be compared to their Intel counterparts.

leman · Monday at 8:39 AM

dada_dave said:
As posted by Xiao_Xi at Macrumors:

https://videocardz.com/newz/snapdragon-x-series-chips-cost-only-half-as-much-as-intel-raptor-lakes-with-battery-life-up-to-98-higher

A very interesting leak from Dell on how much cheaper with better life the new Qualcomm processors will be compared to their Intel counterparts.

I am very skeptical about Dell releasing the new XPS with an OLED screen at $1299, that will likely still be the cheaper FHD+ panel. A high-res display option will probably be closer to $1799

dada_dave · Monday at 9:17 AM

leman said:
I am very skeptical about Dell releasing the new XPS with an OLED screen at $1299, that will likely still be the cheaper FHD+ panel. A high-res display option will probably be closer to $1799

They seemed to be saying the $1199 was the FHD screen I couldn’t see the price info for OLED but I agree that doesn’t seem like just $100 extra. I admit I didn’t understand TRU vs SAPP, is TRU like people buying upgrades?

Also the target customer description is hilarious as is the chart directly comparing to Apple’s … M2 Edit: and the M1! I didn’t even notice that. Oh dear. More evidence this was supposed to be out at least a year ago.

I’ve never set in foot in a big company but even I know that description is just so … corporate. I’d love to know what Apple’s description of us is … actually you know what? Scratch that I definitely don’t want to know!

Artemis · Tuesday at 9:34 PM

Edit: and the M1! I didn’t even notice that. Oh dear. More evidence this was supposed to be out at least a year ago.

At the end of the day while this is late by a year for sure (thanks to the Arm lawsuit in part) it still doesn’t matter because the real competition is other Windows laptops, and AMD and Intel are still behind despite this being late.

And, IME the reason people switch to Macs recently if they do (still small numbers but it was headed in that direction which is part of why MS is in on Arm — they need something even close to good enough, and AMD and Intel can’t do it) is because AMD/Intel offerings are just so mediocre for battery life — or responsiveness in general use while getting good battery life. Also thermals.

The power curves from this *clean sheet core* with Qualcomm’s fabric are great — and this is platform power measured with the whole board, DRAM, VRM minus idle, which is what you actually want to measure, and no, macOS powermetrics is not accurate and does not capture that full picture anyway*, and AMD/Intel’s base are even worse about this and “power consumption”.

The top of the graph around 12W is about 2800-2850 GB6 ST.

*the M2 and M3 — M3/M4 especially, which have a big node advantage — are doubtless better on ST perf/W, but again, first product, and look at that power floor (this is part of why battery life can be so good) for QC and curve below 10W. It’s a huge change for Windows PC customers. Basically an M2 on ST with similar ish perf/W for the P cores but no E Cores yet and more MT.

Cmaier · Tuesday at 9:45 PM

Artemis said:
At the end of the day while this is late by a year for sure (thanks to the Arm lawsuit in part) it still doesn’t matter because the real competition is other Windows laptops, and AMD and Intel are still behind despite this being late.

Do we know that the lawsuit had anything to do with it? Did qualcomm throw out the entire nuvia design and start again? I hadn’t heard anything since the suit was filed.

Also, welcome!

Artemis · Tuesday at 10:01 PM

Jimmyjames said:
I really struggle to understand some of these recent scores. They are all over the place.
View attachment 29132

Qualcomm isn’t pulling anything — other than the Linux results in October which was sneaky IMO (and comparing to the M2 Max in ST power since it has a huge bus and is a bigger chip — but oh well).

it’s just A) engineering sampling and weird testing behavior from the very low scores — you see this all the time and especially with something like Windows or Android and their vendors pre-release. Both OS’s also have more granular control offered even to testers or users so you can clock down. Idk — but I’m not surprised.

But more importantly for the variation from 2200-2900 that seems normal but weird:

So there are three X Elite SKUs which vary with regard to “dual core” (2 cores, one from 2/3 clusters can hit these clocks) performance:

One with a 4.2GHz
One with a 3.8GHz
One with a 3.4GHz (so, 0 boost, all core is 3.4GHz, but still 12c).

This is about yields basically. If they could ship without much frequency binning they’d do 3.4-3.5GHz apparently.

Mind you 3.4GHz is still slightly above M1 ST in some stuff, or the same. 3.8GHz, M2, and the 4.2 is in between M2 and M3.

“But at 3.4GHz it’s Apple’s 2020 product in ST” well — yes. But for one the MT is at least competitive with that or an M2, and for two — I totally agree, and the M4 is functionally vastly superior — but for Windows: Intel’s Meteor Lake in GB6 trends towards 2200-2400 in real reviews and consumes 20W to do so, AMD’s Phoenix/Zen 4 which will still be around for long time does around 2300-2700 (in a very good run on a maxxed out SKU anyway), but again at much more power.

Artemis · Tuesday at 10:08 PM

Artemis said:
Qualcomm isn’t pulling anything — other than the Linux results in October which was sneaky IMO (and comparing to the M2 Max in ST power since it has a huge bus and is a bigger chip — but oh well).

it’s just A) engineering sampling and weird testing behavior from the very low scores — you see this all the time and especially with something like Windows or Android and their vendors pre-release. Both OS’s also have more granular control offered even to testers or users so you can clock down. Idk — but I’m not surprised.

But more importantly for the variation from 2200-2900 that seems normal but weird:

So there are three X Elite SKUs which vary with regard to “dual core” (2 cores, one from 2/3 clusters can hit these clocks) performance:

One with a 4.2GHz
One with a 3.8GHz
One with a 3.4GHz (so, 0 boost, all core is 3.4GHz, but still 12c).

This is about yields basically. If they could ship without much frequency binning they’d do 3.4-3.5GHz apparently.

Mind you 3.4GHz is still slightly above M1 ST in some stuff, or the same. 3.8GHz, M2, and the 4.2 is in between M2 and M3.

“But at 3.4GHz it’s Apple’s 2020 product in ST” well — yes. But for one the MT is at least competitive with that or an M2, and for two — I totally agree, and the M4 is functionally vastly superior — but for Windows: Intel’s Meteor Lake in GB6 trends towards 2200-2400 in real reviews and consumes 20W to do so, AMD’s Phoenix/Zen 4 which will still be around for long time does around 2300-2700 (in a very good run on a maxxed out SKU anyway), but again at much more power.

And yes, AMD and Intel will soon have new parts.
The caveat is that Lunar Lake, their N3 M3/4 competitor, is 140mm^2 for the main die on N3B, yet it’s just a 4+4 part that I am very skeptical will match Apple’s ST perf/W, and Skymont the “efficiency” cores are unlikely to be efficient. Node isn’t everything. It will cost them more though.

As for AMD, Strix Point is a size increase and will cost more, but Phoenix, the Zen 4 mainstream/premium 178mm^2 die on N4 that is being compared to (7940HS is one of those dies) will continue to be around for some time in entry premium or mainstream parts.

QC’s X Elite/ single SKU X Plus die is…
~ 170mm^2 give or take some — per Charlie Demerjian, who is often full of it but occasionally hits it out of the park and this seems right.

And sure enough based on the leaked Dell report about the X Plus/10c part, they can and are selling some of these for $145.

Artemis · Tuesday at 10:14 PM

Cmaier said:
Do we know that the lawsuit had anything to do with it? Did qualcomm throw out the entire nuvia design and start again? I hadn’t heard anything since the suit was filed.

Also, welcome!

Hey Cliff. Yeah, so basically they did have to restart on certain parts of the core. I will find the documents and article — but they redid some structures that apparently made virtually 0 functional difference (as in they reimplemented it on their own or with their own methods) to the core, but needed to do it for legal reasons.

Pretty sure it wasn’t the whole core, though.

That said, while the rumors the core didn’t have DVFS (lol) are now obviously false, the time from server core on Nuvia’s own or Arm’s fabric to a core on Qualcomm’s with more granular power control I think did take some time away. I suspect they might even have some tweaks for the “Oryon” core in their next phone chip on N3E coming this fall (which they confirmed exists, publicly) because they had an extra half year or more before that started.

Artemis · Tuesday at 10:39 PM

Okay, here it is cliff. Basically seems like they re-implemented some structures that Arm defines possibly mandatory specifications for as part of the ALA (ISA licensing). Probably stuff like SIMD structures or Arm64 decoders and registers, has to be something basic where they need to meet a spec (or maybe Arm even provides guides, who knows).

From Qualcomm’s legal response:

---

229. Likewise, ARM’s demand that NUVIA destroy ARM Confidential Information was baseless because NUVIA was licensed to this information under Qualcomm’s license agreements and Qualcomm’s further development of this technology was also licensed under Qualcomm’s license agreements.

230. Nonetheless, Qualcomm and NUVIA acted swiftly, at great time and expense, to take additional measures to satisfy ARM’s unreasonable demand to comply with the termination provisions in NUVIA’s license agreements.

231. Qualcomm and NUVIA removed NUVIA-acquired ARM Confidential Information from its designs and redesigned its products to replace it with information acquired under Qualcomm’s license—even though it was the exact same information—then quarantined a copy. Qualcomm also removed NUVIA-acquired ARM Confidential Information from its design environment and systems and quarantined it.

232. During this period, Qualcomm’s engineers were not working on further development of products because their attention was focused on the removal of NUVIA-acquired ARM Confidential Information.

233. NUVIA then provided ARM with its certification of its compliance with the termination provisions on April 1, 2022, as requested by ARM, even though the termination provisions were inapplicable.

dada_dave · Tuesday at 11:03 PM

Artemis said:
And yes, AMD and Intel will soon have new parts.
The caveat is that Lunar Lake, their N3 M3/4 competitor, is 140mm^2 for the main die on N3B, yet it’s just a 4+4 part that I am very skeptical will match Apple’s ST perf/W, and Skymont the “efficiency” cores are unlikely to be efficient. Node isn’t everything. It will cost them more though.

As for AMD, Strix Point is a size increase and will cost more, but Phoenix, the Zen 4 mainstream/premium 178mm^2 die on N4 that is being compared to (7940HS is one of those dies) will continue to be around for some time in entry premium or mainstream parts.

QC’s X Elite/ single SKU X Plus die is…
~ 170mm^2 give or take some — per Charlie Demerjian, who is often full of it but occasionally hits it out of the park and this seems right.

And sure enough based on the leaked Dell report about the X Plus/10c part, they can and are selling some of these for $145.

This is what concerns me more than the comparison with the Apple M-cores: reportedly AMD and Intel are going to be discussing Lunar Lake and Strix Point in May for their fall releases and they might suck the oxygen out of the room depending on how good they are. Qualcomm has to compete with those chips often at a native vs translated disadvantage and Qualcomm doesn't have Apple's advantage that they can force the whole software ecosystem to move. That means they have to be that much better than AMD and Intel to entice people over. I do wonder if that's what's causing some of the problems Charlie was hearing about, maybe exaggerated which wouldn't be unusual, about how the Windows software stack made the Qualcomm cores feel like Celeron's - that Windows translation layer occasionally sucks and not everything that users might expect is native. He did say it wasn't the silicon's fault and singled out Windows. I mean the alternative is he's just wrong, but that's what concerns me. I know I was ribbing the Qualcomm Oryon cores above, and I'll admit I'm not a fan of Qualcomm at all, but despite all that I actually would like to see Windows on Arm, which for now means Qualcomm, succeed.

Artemis said:
Okay, here it is cliff. Basically seems like they re-implemented some structures that Arm defines possibly mandatory specifications for as part of the ALA (ISA licensing). Probably stuff like SIMD structures or Arm64 decoders and registers, has to be something basic where they need to meet a spec (or maybe Arm even provides guides, who knows).

From Qualcomm’s legal response:

I hadn't realized the lawsuit had actually affected things. Thanks for the info! That makes sense as you said given Oryon V2 cores are coming out so soon relative to the first generation. Hopefully they get a big uplift and can iterate quickly.

Artemis · Tuesday at 11:53 PM

dada_dave said:
This is what concerns me more than the comparison with the Apple M-cores: reportedly AMD and Intel are going to be discussing Lunar Lake and Strix Point in May for their fall releases and they might suck the oxygen out of the room depending on how good they are. Qualcomm has to compete with those chips often at a native vs translated disadvantage and Qualcomm doesn't have Apple's advantage that they can force the whole software ecosystem to move. That means they have to be that much better than AMD and Intel to entice people over. I do wonder if that's what's causing some of the problems Charlie was hearing about, maybe exaggerated which wouldn't be unusual, about how the Windows software stack made the Qualcomm cores feel like Celeron's - that Windows translation layer occasionally sucks and not everything that users might expect is native. He did say it wasn't the silicon's fault and singled out Windows. I mean the alternative is he's just wrong, but that's what concerns me. I know I was ribbing the Qualcomm Oryon cores above, and I'll admit I'm not a fan of Qualcomm at all, but despite all that I actually would like to see Windows on Arm, which for now means Qualcomm, succeed.

I hadn't realized the lawsuit had actually affected things. Thanks for the info! That makes sense as you said given Oryon V2 cores are coming out so soon relative to the first generation. Hopefully they get a big uplift and can iterate quickly.

This was only a few months to be fair, like Feb to April, but given the timeline of the dispute and the fact that Qualcomm had multiple SoCs going from server and 5G base station to the laptop which I think involved more work for the first time — this is a clean sheet core etc, it makes the delay less embarrassing I think.

And yeah, hoping to see some meaningful uplifts and improved power in future cores. IPC gap with Apple’s M4 is like 24-28% lol.

Qualcomm also is a very… ruthless company, I agree, but there’s no one else (American anyway) that could’ve bought Nuvia and have the same reach. They just have an enormous technology portfolio. Stuff like modems or their ISPs is a big thing, if Nvidia for example had Nuvia that would’ve been worse honestly, because they likely wouldn’t even bother to hold SoCs for phones and probably wouldn’t break in there or tablets, and then they didn’t get to buy Arm so they would be keeping those cores for their own use.

The good news is that the Cortex X5 is supposed to be a significant upgrade in IPC, power while doing so remains the issue based off an X4’s results, but yeah, things are heating up.

Andropov · Wednesday at 12:51 AM

Artemis said:
The power curves from this *clean sheet core* with Qualcomm’s fabric are great — and this is platform power measured with the whole board, DRAM, VRM minus idle, which is what you actually want to measure, and no, macOS powermetrics is not accurate and does not capture that full picture anyway*, and AMD/Intel’s base are even worse about this and “power consumption”.

Could you elaborate? I'd be interested to know more. It's hard to empirically check the numbers coming from powermetrics as there's no other way to measure power (other than wall power) of a Mac

leman · Wednesday at 1:03 AM

Artemis said:
The power curves from this *clean sheet core* with Qualcomm’s fabric are great — and this is platform power measured with the whole board, DRAM, VRM minus idle, which is what you actually want to measure, and no, macOS powermetrics is not accurate and does not capture that full picture anyway*, and AMD/Intel’s base are even worse about this and “power consumption”.

The top of the graph around 12W is about 2800-2850 GB6 ST.

*the M2 and M3 — M3/M4 especially, which have a big node advantage — are doubtless better on ST perf/W, but again, first product, and look at that power floor (this is part of why battery life can be so good) for QC and curve below 10W. It’s a huge change for Windows PC customers. Basically an M2 on ST with similar ish perf/W for the P cores but no E Cores yet and more MT.

It's not bad compared to x86 cores, at the same time I don't really follow the node advantage argument. Nuvia team is using quite a lot of power to reach these scores at N4P (node either identical or superior to what M2 uses, depending on which analysis you trust). In the meantime M4 delivers almost 4000 GB6 points at 7 watts. That is a huge difference which cannot be explained by 3nm node alone.

I am very curious to see what Nuvia team will achieve given some time. Their initial offering so far has been rather disappointing. After the strong initial statement I expected more.

dada_dave · Wednesday at 2:03 AM

Artemis said:
This was only a few months to be fair, like Feb to April, but given the timeline of the dispute and the fact that Qualcomm had multiple SoCs going from server and 5G base station to the laptop which I think involved more work for the first time — this is a clean sheet core etc, it makes the delay less embarrassing I think.

And yeah, hoping to see some meaningful uplifts and improved power in future cores. IPC gap with Apple’s M4 is like 24-28% lol.

Qualcomm also is a very… ruthless company, I agree, but there’s no one else (American anyway) that could’ve bought Nuvia and have the same reach. They just have an enormous technology portfolio. Stuff like modems or their ISPs is a big thing, if Nvidia for example had Nuvia that would’ve been worse honestly, because they likely wouldn’t even bother to hold SoCs for phones and probably wouldn’t break in there or tablets, and then they didn’t get to buy Arm so they would be keeping those cores for their own use.

The good news is that the Cortex X5 is supposed to be a significant upgrade in IPC, power while doing so remains the issue based off an X4’s results, but yeah, things are heating up.

Ha! I just went on a rant about IPC in another thread, btw not aimed at you or really anyone just venting frustration, but yeah I’m looking forward to what the X5 and Oryon V2 have to offer. What I’m most interested in is if anyone can actually leapfrog Apple in CPU design or if everyone going to be eventually converging to much the same place with only relatively minor differences.

Like I have to admit, during the depths of the 2010s, I thought single core performance was peaking because of course the then leader Intel had peaked. Little did I know … well I suppose by the end of the decade I’d figured it out, reading Anandtech and so forth but even with all that the M1 was still such a shock, glorified A14X that it was but in the chassis of a laptop and the implications for larger chips …

Artemis · Wednesday at 7:08 AM

leman said:
It's not bad compared to x86 cores, at the same time I don't really follow the node advantage argument. Nuvia team is using quite a lot of power to reach these scores at N4P (node either identical or superior to what M2 uses, depending on which analysis you trust).

Eh. M2 Macs in practice is not actually using like 5W of active power for full ST. An A15 iPhone might (actually even then it can go higher) but some (AMD and Intel fans are worse in other ways, obviously, and just totally disingenuous and think power is NBD, but still) have a habit of downward estimating Apple power by using Power Metrics which is not only badly modeled and inaccurate but now only captures SoC power and not even the DRAM much less power delivery etc (which isn’t happening here) s of updates.

The wall minus idle or ideally direct PMIC fuelguage instrumentation is the gold standard.

And FWIW, this [playing the software game] isn’t actually good turf for Apple (or QC, or soon MediaTek and Nvidia) grand scheme of things because part of how Apple and mobile vendors like them keeps power down is reducing DRAM accesses and not using junk PMICs or power partitioning in their chips. The Qualcomm graphs are for platform power which is SoC/Package + DRAM + VRMs, so not ultra dissimilar from what you’d get from the wall, which is what you actually want to measure, and is surprisingly honest.

This is also exactly what Geekerwan does, and what Andrei F (who now works at Qualcomm btw and is doing some of those measurements) measured for years now.

The Qualcomm graph goes up to 12W platform for a GB6 of 2850-2900 for the X Elite on Windows. Taking it down to 8-10 is about going to match an M2 ST, I think.

And FWIW when you look at some M2 Macs hooked up to an external display doing Cinebench 23 (which is bad benchmark because of lack of NEON but 2024 fixes this), you do basically see this, it’s more like 9-11W. That said, I don’t think they took idle/static out, if they did it would probably remove a few watts, but their idle power is also fairly reasonable.

We don’t have an actual Geekbench 6 ST comparison from Qualcomm inclusive of the M2 and the X series, and FWIW I believe that’s intentional by QC because the M2 would have a better curve or similar, but maybe not at all by much as you think.

Even using the A15 as a lower bound, where it has 4x the SLC cache (32 v 8) half the DRAM bus width, smaller DRAM, a smaller overall chip, is pushed 10% less on frequency (3.23GHz vs 3.5GHz, and where that frequency really counts too, just not as much as AMD/Intel), something more like 6.5-8W active power makes sense.

I suspect it’s about 7-9W depending on the load. Again, I do think Apple’s is probbaly still more efficient, and even 7.5W vs 9.5W is a significant difference, but you’re probably overstating the gaps on the same node.

leman said:
In the meantime M4 delivers almost 4000 GB6 points at 7 watts. That is a huge difference which cannot be explained by 3nm node alone.

Well it’s never just the node but usually architectural and cache adds too with the added density or spending, I agree, but I would wager Qualcomm with Nuvia unlike AMD (see Zen in the graph) and Intel are not going to sit around or waste the new nodes re: mobile.

The 8 Gen 4 is shipping with Oryon on N3E this fall fwiw, so I suspect there are also some integration pains that led to less than perfect laptop active power, because they absolutely wouldn’t throw that in a phone even with N3E if they thought they couldn’t tweak it..

Artemis · Wednesday at 7:13 AM

But to address the M4: that was in SpecInt2017 “int rate” where it drew 7W (6.98) yes? I have a problem with that graph because Geekerwan has two sets of A17P results, one with power at 3.62W and the other with power at 5.7W. It’s only 3.62 in int rate.

And, just a hunch, I suspect the 5.7W figure is closer to what the A17 Pro would draw in GB 6. By symmetry I would be surprised if M4 active power draw in full SpecInt 2017 suite was 7W, or likewise GB6.

It seems like the specific thing he did here was maybe just one subtest of Spec — Int_rate whereas in the other normal graphs you’re seeing the full suite for Int and FP.

To be blunt: The A17 Pro doing 3.62W at 3.78GHz in a full GB6 run is flat out unbelievable because Apple has pushed power up slightly for a while now, and while they’re still ahead of competitors on perf/W by a mile, that’s just too low.

leman said:
I am very curious to see what Nuvia team will achieve given some time. Their initial offering so far has been rather disappointing.

I do agree the IPC and efficiency could’ve been better and even Apple’s M2 has a lead still. I think this year’s 8 Gen 4 & next year’s 5, the phone chips, will be telling moments for Nuvia.

Jimmyjames · Wednesday at 7:32 AM

Artemis said:
Eh. M2 Macs in practice is not actually using like 5W of active power for full ST. An A15 iPhone might (actually even then it can go higher) but some (AMD and Intel fans are worse in other ways, obviously, and just totally disingenuous and think power is NBD, but still) have a habit of downward estimating Apple power by using Power Metrics which is not only badly modeled and inaccurate but now only captures SoC power and not even the DRAM much less power delivery etc (which isn’t happening here) s of updates.

The wall minus idle or ideally direct PMIC fuelguage instrumentation is the gold standard.

I would be interested in any evidence you have for this? How exactly do you know power metrics is badly modelled and inaccurate? Why would the wall minus idle on a device with a battery be better.

I’m also a little surprised to hear that the M2 is using more power than previously reported, and that the M4’s power usage is cause for concern.

jbailey · Wednesday at 7:54 AM

It's actually pretty easy to get an accurate report of total system power usage when you are on battery. You can get things like instant amperage and voltage from ioreg from the macOS command line. I do this all the time from a command line tool that I wrote. For example:

Current charge 4391 mAh
Full charge capacity 4598 mAh
Charge at 95.400% capacity
Design capacity 4563 mAh
Current full charge at 100.700% design capacity
Cycle count is 31
Battery temperature 79.07 °F (26.15 °C)
Instant Amperage -325 mA
Battery voltage 13.01 V
Discharging with -4.228 Watts
Power adapter disconnected
Current capacity 100%
State of charge 95%
Time Remaining 13:30

To run in clamshell mode on battery with an external monitor, keyboard, and mouse you can use pmset:

pmset disablesleep 1

Nuvia: don’t hold your breath

up

Site Champ

Elite Member

Site Champ

Elite Member

Power User

Site Master

Power User

Attachments

Power User

Power User

Power User

Elite Member

Power User

Site Champ

Site Champ

Elite Member

Power User

Power User

Site Champ

Power User

Similar threads