Quantcast
Channel: Processors – Michael Tsai

Steve Jobs and the Missing “Intel Inside” Sticker

0
0

Ken Segall:

The Intel Inside marketing strategy will be studied in business schools around the world for decades. It represented bold thinking and bold spending.

[…]

Apple’s internal testing showed that the newest PowerPC processor was faster than Intel’s fastest chip. With a real competitive advantage to work with, we did what any feisty agency would do: we declared war on Intel.

Suddenly, it was to our advantage that Intel had become the unifying, driving force in PCs. We didn’t have to attack any PC maker by name—we could take on the entire PC industry simply by attacking Intel.

Apple did a lot of Photoshop demos. My recollection is that aside of graphics tasks that emphasized floating point, PowerPC-based Macs were mostly slower than Intel-based PCs. This was especially noticeable with compilation. The stated reason for the Intel transition was performance per watt, but by that time the high-powered PowerPCs were behind schedule, too. When I got the original Core Duo iMac in 2006, its performance blew away the dual-G5 tower that I had been using, even though the Core Duo was derived from the mobile Pentium architecture. So of course it was great in notebooks, which were still stuck using G4s.


A11 Bionic

0
0

Lance Ulanoff (via Joe Rossignol):

Srouji told me that when Apple architects silicon, they start by looking three years out, which means the A11 Bionic was under development when Apple was shipping the iPhone 6 and its A8 chip. Back then we weren’t even talking about AI and machine learning at a mobile level and, yet, Srouji said, “The neural engine embed, it’s a bet we made three years ahead.”

[…]

The high-performance cores and efficiency cores introduced with the A10 Fusion CPU got an iterative update, including the addition of two more cores and the ability to handle asymmetric multi-processing, which means the chip can run 1, 2, 3, 4, 5, or 6 cores at once. Managing the core use on the now 10-nanometer CPU is one of the reasons the A11 Bionic, according to Apple, is 70 percent more energy efficient (even while being 25 percent faster than the A10). How the system decides which cores to use (high performance or high efficiency) and how many is a little non-obvious.

[…]

The secret sauce of a Neural Engine, what makes it different from other parts of the A11 Bionic, is its ability to handle matrix multiplications and floating-point processing.

Apple is not, however, opening this neural brain to everyone.

[…]

There are other things the A11 Bionic controls that Apple doesn’t often talk about, including the storage controller that includes custom error-correcting code (ECC) algorithms.

This last bit relates to APFS not checksumming its data blocks.

John Gruber:

I asked Apple last week what exactly was “bionic” about the A11 chip system. The answer, translated from Apple marketing-speak to plain English, is that The Bionic Man and Woman were cool, and the A11 chip is very cool. I think they’ve started giving these chips names in addition to numbers (last year’s was the A10 Fusion) because the numbers alone belie the true nature of how significant the improvements in these chips are. Going from A10 to A11 is like going from 10 to 11 mathematically, which implies a 10 percent improvement. That’s not the case at all here — the A11 is way more than a 10 percent improvement over the A10. So they’ve given it a name like “Bionic” to emphasize just how powerful it is.

Update (2017-09-22): Mark Spoonauer (via Phil Schiller):

The “Bionic” part in the name of Apple’s A11 Bionic chip isn’t just marketing speak. It’s the most powerful processor ever put in a mobile phone. We’ve put this chip to the test in both synthetic benchmarks and some real-world speed trials, and it obliterates every Android phone we tested.

[…]

The iPhone 8 even edged out the score from the 13-inch Apple MacBook Pro with a 7th-generation Core i5 processor. That notebook notched 9,213. Is Geekbench 4 really comparable from phone to desktop? According to the founder of Geekbench, John Poole, “the short is answer is yes that the scores are comparable across platforms, so if an iPhone 8 scores higher than an i5, then the iPhone 8 is faster than the i5.”

Update (2017-10-03): David Heinemeier Hansson:

Google Pixel scores a meager 50 on the JetStream JS benchmark. iPhone 8 is at 220. Almost 5x?!? Even iPhone 6S is at 128. Embarrassing.

Dan Masters:

As I’ve repeatedly said: if iOS is artificially slowing down basic tasks due to animations, it doesn’t matter how fast Apple’s chips become.

Intel CPU Design Flaw Necessitates Kernel Page Table Isolation

0
0

John Leyden and Chris Williams (tweet):

A fundamental design flaw in Intel’s processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug.

[…]

Crucially, these updates to both Linux and Windows will incur a performance hit on Intel products. The effects are still being benchmarked, however we’re looking at a ballpark figure of five to 30 per cent slow down, depending on the task and the processor model.

[…]

Similar operating systems, such as Apple’s 64-bit macOS, will also need to be updated – the flaw is in the Intel x86-64 hardware, and it appears a microcode update can’t address it. It has to be fixed in software at the OS level, or go buy a new processor without the design blunder.

[…]

The fix is to separate the kernel’s memory completely from user processes using what’s called Kernel Page Table Isolation, or KPTI. At one point, Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka FUCKWIT, was mulled by the Linux kernel team, giving you an idea of how annoying this has been for the developers.

Tom Lendacky (via Hacker News):

AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

I wonder when Apple will have a macOS update and whether it will ship any Macs with AMD processors, depending on how long it takes Intel to develop a fix.

Ian King and Jing Cao (via Hacker News):

AMD shares surged as much as 7.2 percent to $11.77 Wednesday. Intel fell as much as 3.8 percent, the most since April, to $45.05.

[…]

The Santa Clara, California-based company’s chips have more than 80 percent market share overall and more than 90 percent in laptops and servers.

See also: The mysterious case of the Linux Page Table Isolation patches.

Update (2018-01-03): Alex Ionescu:

The question on everyone’s minds: Does MacOS fix the Intel #KPTI Issue? Why yes, yes it does. Say hello to the “Double Map” since 10.13.2 -- and with some surprises in 10.13.3 (under Developer NDA so can’t talk/show you).

The performance drop on a system with PCID is minimal. Most Macs have PCID.

Michael Larabel (via Hacker News):

I’ve been running some benchmarks and will have some more extensive tests soon, but given all the emails today about the issue, here are my initial benchmark numbers on two systems.

See also: MacRumors.

Update (2018-01-03): Intel (Hacker News):

Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect. Based on the analysis to date, many types of computing devices — with many different vendors’ processors and operating systems — are susceptible to these exploits.

Intel is committed to product and customer security and is working closely with many other technology companies, including AMD, ARM Holdings and several operating system vendors[…]

At first, I didn’t like how they wrote this in such a way as to imply that AMD and ARM processors are among those affected, but apparently maybe they are.

See also: Pierre Lebeaupin.

Matt Linton and Pat Parseghian:

The Project Zero researcher, Jann Horn, demonstrated that malicious actors could take advantage of speculative execution to read system memory that should have been inaccessible. For example, an unauthorized party may read sensitive information in the system’s memory such as passwords, encryption keys, or sensitive information open in applications. Testing also showed that an attack running on one virtual machine was able to access the physical memory of the host machine, and through that, gain read-access to the memory of a different virtual machine on the same host.

These vulnerabilities affect many CPUs, including those from AMD, ARM, and Intel, as well as the devices and operating systems running on them.

[…]

The Project Zero researchers discovered three methods (variants) of attack, which are effective under different conditions. All three attack variants can allow a process with normal user privileges to perform unauthorized reads of memory data, which may contain sensitive information such as passwords, cryptographic key material, etc.

@FioraAeterna:

oh, and one last thing: the thing that gets me most about this exploit is it isn’t really a single exploit, it’s a whole category of exploits. verifying that no further attacks exist sounds EXTREMELY hard.

See also: Hacker News.

Juli Clover:

ARM and AMD have both issued statements following Intel’s press release. AMD says there is a “near zero risk” to AMD processors at this time, while ARM says its processors are vulnerable.

Update (2018-01-04): Meltdown and Spectre:

These hardware bugs allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit Meltdown and Spectre to get hold of secrets stored in the memory of other running programs. This might include your passwords stored in a password manager or browser, your personal photos, emails, instant messages and even business-critical documents.

Meltdown and Spectre work on personal computers, mobile devices, and in the cloud. Depending on the cloud provider’s infrastructure, it might be possible to steal data from other customers.

ARM (archive, Hacker News):

The majority of Arm processors are not impacted by any variation of this side-channel speculation mechanism. A definitive list of the small subset of Arm-designed processors that are susceptible can be found below.

Via Bob Burrough:

Arm’s response is downright misleading.

Microsoft Azure:

The majority of Azure infrastructure has already been updated to address this vulnerability. Some aspects of Azure are still being updated and require a reboot of customer VMs for the security update to take effect.

Troy Wolverton:

Intel CEO Brian Krzanich sold off $24 million worth of stock and options in the company in late November.

[…]

Intel says the stock sale was unrelated to the vulnerability, but came as part of a planned divestiture program. But Krzanich put that stock sale plan in place in October — several months after Intel was informed of the vulnerability.

Linus Torvalds (via The Register):

I think somebody inside of Intel needs to really take a long hard look at their CPU’s, and actually admit that they have issues instead of writing PR blurbs that say that everything works as designed.

.. and that really means that all these mitigation patches should be written with “not all CPU’s are crap” in mind.

Or is Intel basically saying “we are committed to selling you shit forever and ever, and never fixing anything”?

Aaron Pressman (via Alex Ionescu):

AMD said its chips were affected by some but not all of a series of related security exploits uncovered by researchers. AMD has already developed a simple software fix for its chips that will not impact PC performance, an AMD spokesman said. “Due to differences in AMD’s architecture, we believe there is a near zero risk to AMD processors at this time,” the company said in a statement. “We expect the security research to be published later today and will provide further updates at that time.”

Microsoft Edge Team (via Steve Troughton-Smith):

These techniques can be used via JavaScript code running in the browser, which may allow attackers to gain access to memory in the attacker’s process.

[…]

Initially, we are removing support for SharedArrayBuffer from Microsoft Edge (originally introduced in the Windows 10 Fall Creators Update), and reducing the resolution of performance.now() in Microsoft Edge and Internet Explorer from 5 microseconds to 20 microseconds, with variable jitter of up to an additional 20 microseconds. These two changes substantially increase the difficulty of successfully inferring the content of the CPU cache from a browser process.

Mozilla (Hacker News):

Since this new class of attacks involves measuring precise time intervals, as a partial, short-term, mitigation we are disabling or reducing the precision of several time sources in Firefox.  This includes both explicit sources, like performance.now(), and implicit sources that allow building high-resolution timers, viz., SharedArrayBuffer.

Chromium (via Yehuda Katz):

Chrome allows users to enable an optional feature called Site Isolation which mitigates exploitation of these vulnerabilities. With Site Isolation enabled, the data exposed to speculative side-channel attacks are reduced as Chrome renders content for each open website in a separate process.

[…]

Don’t serve user-specific or sensitive content from URLs that attackers can predict or easily learn. Attackers can load such URLs in their attack pages (e.g. <img src="https://email.example.com/inbox.json"/>) to get the sensitive information into the process rendering their page, and can then use out-of-bounds reads to discover the information. Use anti-CSRF tokens and SameSite cookies, or random URLs to mitigate this kind of attack.

Kevin Beaumont:

Okay there is another VERY IMPORTANT THING with Microsoft Meltdown patches - “Customers will not receive these security updates and will not be protected from security vulnerabilities unless their anti-virus software vendor sets the following registry key”

Joe Armstrong:

I think I might have said now and again that

“shared memory is the root of all evil”

now I should add

“Shared memory is the root of all security problems”

Aras Pranckevičius:

“Retpoline”, an optional compiler flag to deal with Spectre attack…. Landing to llvm/gcc as we speak. Virtual calls, as well as switch statements etc., are about to get more expensive.

Here’s the Hacker News thread about the LLVM patch.

Jacek Galowicz (Meltdown paper, PDF):

This kind of speculative execution does not only occur over branches: When a program accesses a specific cell of memory, the processor needs to decide if it is allowed to do so by consulting the virtual memory subsystem. If the memory cell has previously been cached, the data is already there and data is returned while the processor figures out if this access is legitimate. With speculative execution, the processor can trigger actions depending on the result of a memory access while working to complete the corresponding instruction.

If the memory access was not legitimate, the results of such an instruction stream need to be discarded, again. For a user application it is not possible to access the final result of any computation relying on such an illegitimate memory access. The interesting crux of this is that although retirement is correctly performed, all speculatively executed and then discarded instructions have still left some measurable effect on the cache subsystem…

[…]

While none of these spots contains anything useful before or after this sequence of machine code instructions, it is possible to make sure that the whole user space array is completely uncached/cold before executing them. After trying to execute them, it is necessary to recover from the page fault that the processor reacts with. But then, one of the spots in the user space array remains cached!

Finding out the offset of the cached/warm spot of memory in the user space array allows for calculating the actual value that was read from memory, which can be done by measuring access timings on each of the 256 spots that could have been touched by the speculative execution.

Spectre paper (PDF):

Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim’s confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim’s process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, static analysis, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing/side-channel attacks. These attacks represent a serious threat to actual systems, since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices.

See also:

Update (2018-01-05): See also:

Update (2018-01-08):mikeymikey:

Apple JUST updated today (Jan 5th) and REMOVED mention of 10.12 and 10.11 being fixed for CVE-2017-5754 aka #Meltdown

Only 10.13.2 contains the fix.

Juli Clover:

Apple today confirmed that it has addressed the recent “Meltdown” vulnerability in previously released iOS 11.2, macOS 10.13.2, and tvOS 11.2 updates, with additional fixes coming to Safari in the near future to defend against the “Spectre” vulnerability.

Zac Hall (Hacker News):

Apple has released an update to macOS High Sierra for all Macs running macOS 10.13.2. The supplemental security update likely addresses the Spectre flaw that affected Safari and may contain further mitigations for Meltdown.

Jon Masters:

At Red Hat, we’ve been working on mitigations for potential attacks under standard industry security embargos, deploying small, targeted teams operating on a “need to know” basis in order to prepare ahead of public disclosure. I was fortunate enough to be co-leading our efforts at mitigation of Meltdown and Spectre, alternatively known as variants 1, 2, and 3 of a family of similar attacks disclosed by Google Project Zero in a blog post on January 3rd. In the course of our efforts, we reproduced Meltdown (variant 3) in our labs, and examined other variants, while working alongside many of our trusted hardware partners on mitigations.

While we have a solid understanding of these vulnerabilities and the current analysis of the contributing factors as well as patches to mitigate their potential impact, we will continue to collaborate with our partners, customers and researchers on this situation. Additionally, we would like to help others to understand these complex issues, ideally using language and terms that don’t require the reader to be in the chip design business.

See also:

Update (2018-01-09): Andy Greenberg:

Yet when Intel responded to the trio’s warning—after a long week of silence—the company gave them a surprising response. Though Intel was indeed working on a fix, the Graz team wasn’t the first to tell the chip giant about the vulnerability. In fact, two other research teams had beaten them to it. Counting another, related technique that would come to be known as Spectre, Intel told the researchers they were actually the fourth to report the new class of attack, all within a period of just months.

“As far as I can tell it’s a crazy coincidence,” says Paul Kocher, a well-known security researcher and one of the two people who independently reported the distinct but related Spectre attack to chipmakers. “The two threads have no commonality,” he adds. “There’s no reason someone couldn’t have found this years ago instead of today.”

Gil Tene (via Hacker News):

PCID is now a critical feature for both security and performance.

Ezequiel Bruni (via Matt Birchler):

Even if we did magically get perfect fixes for the Meltdown and Spectre problems, this is going to spark a larger conversation about security and JavaScript in particular. I mean, what other bits of hardware could be compromised by a simple web page? This could happen again. No, to hell with that. This will happen again.

Filip Pizlo:

Spectre impacts WebKit directly. Meltdown impacts WebKit because WebKit’s security properties must first be bypassed (via Spectre) before WebKit can be used to mount a Meltdown attack.

[…]

This document explains how Spectre and Meltdown affect existing WebKit security mechanisms and what short-term and long-term fixes WebKit is deploying to provide protection against this new class of attacks.

This is a great write-up.

See also: CommitStrip (via Andy Bargh).

Update (2018-01-11): See also:

Finding a CPU Design Bug in the Xbox 360

0
0

Bruce Dawson (via Mike Ash, Hacker News):

But, the CPU was for a video game console and performance trumped all so a new instruction was added – xdcbt. The normal PowerPC dcbt instruction was a typical prefetch instruction. The xdcbt instruction was an extended prefetch instruction that fetched straight from memory to the L1 d-cache, skipping L2. This meant that memory coherency was no longer guaranteed, but hey, we’re video game programmers, we know what we’re doing, it will be fine.

[…]

So, the branch predictor makes a prediction and the predicted instructions are fetched, decoded, and executed – but not retired until the prediction is known to be correct. Sound familiar? The realization I had – it was new to me at the time – was what it meant to speculatively execute a prefetch. The latencies were long, so it was important to get the prefetch transaction on the bus as soon as possible, and once a prefetch had been initiated there was no way to cancel it. So a speculatively-executed xdcbt was identical to a realxdcbt! (a speculatively-executed load instruction was just a prefetch, FWIW).

And that was the problem – the branch predictor would sometimes cause xdcbt instructions to be speculatively executed and that was just as bad as really executing them.

[…]

I knew that would be the result and yet it was still amazing. All these years later, and even after reading about Meltdown, it’s still nerdy cool to see solid proof that instructions that were not executed were causing crashes.

Previously: Intel CPU Design Flaw Necessitates Kernel Page Table Isolation.

Apple Plans to Use Its Own Chips in Macs From 2020, Replacing Intel

0
0

Mark Gurman (tweet, Hacker News, MacRumors, 9to5Mac, iMore):

The initiative, code named Kalamata, is still in the early developmental stages, but comes as part of a larger strategy to make all of Apple’s devices -- including Macs, iPhones, and iPads -- work more similarly and seamlessly together, said the people, who asked not to be identified discussing private information. The project, which executives have approved, will likely result in a multi-step transition.

[…]

The shift would also allow Cupertino, California-based Apple to more quickly bring new features to all of its products and stand out from the competition. Using its own main chips would make Apple the only major PC maker to use its own processors.

[…]

While the transition to Apple chips in hardware is planned to begin as early as 2020, the changes to the software side will begin even before that.

[…]

Apple’s current chip designs made their name in thin and light mobile devices. That would indicate Apple will start the transition with laptops before moving the designs into more demanding desktop models. Apple has to walk the fine line of moving away from Intel chips without sacrificing the speed and capabilities of its Macs.

John Gruber:

But when you start thinking about the details, this transition would (will?) be very difficult. First, while Apple’s existing A-series chips are better for energy-efficient mobile device use (iPhone, iPad, just-plain MacBook), Apple’s internal team has never made anything to compete with Intel at the high-performance end (MacBook Pros, and especially iMacs and Mac Pros). I’m not saying they can’t. I’m just saying they haven’t shown us anything yet.

Jeff Johnson:

The problem with phone CPU benchmark tests is that they’re only measuring peak performance. Try running a processor-intensive task on your phone for an hour.

Good luck with that.

Steve Troughton-Smith:

Porting macOS, unchanged, stagnant, to ARM as-is would be a massive waste. What’s happening is decidedly not that: it’s very clear that a major transition is happening re the software stack, which is what I’ve argued for for a very long time. iOS and macOS are merging in some way

Steve Troughton-Smith:

It’s increasingly looking like the future of the Mac doesn’t look like the Mac as we know it, with a rumored app stack replacement/transplant & a move away from x86.

enMTW:

It’s not the Mac at all. It’s some hellish combination of the worst attributes of iOS and the worst attributes of Tim Cook’s Apple.

Steve Troughton-Smith:

People really want to believe their world isn’t about to end I think Apple has been singularly about ARM for the past decade and it would be very wishful thinking to hope they’d change that trajectory now

James Thomson:

If you took the current macOS and put it on a desktop class ARM processor, I think it would be a pretty simple transition compared to PPC->Intel. I doubt many would notice, and most apps would just compile out the box. I don’t think it’s the end of anything.

Walt Mossberg:

If this is true, it’s another step towards the next great Apple machine: a consumer laptop running iOS. Call it the MacPad, or revive the name iBook. Use the trackpad the way 3D Touch is used on iOS devices to easily move the cursor. (And build more tricks into it.) I’ll buy it.

Dave Mark:

If Apple builds an ARM-based Mac, what are the hassles involved in porting code from Intel->ARM, beyond recompile?

Rich Siegel:

Anyone who says “you just have to click a check box” or “it’s trivial” without actually having done the transition for a shipping product is engaging in wish fulfillment or marketing.

Erik Schwiebert:

What you said...

Jeff Johnson:

The last PPC Mac shipped in 2006. Rosetta was not removed from Mac OS X until 2011.

The last Intel Core Duo 32-bit Mac shipped in 2007. i386 support has still not been removed.

Jeff Johnson:

I have yet to hear a single person give a plausible transition plan.

Is there a Rosetta-like tech? IDK, but remember that the significant performance hit was semi-acceptable only because Intel chips were much faster than PPC. Don’t let anyone tell you otherwise. I had a last-gen iMac G5 and a first-gen iMac Core Duo. Core Duo crushed the G5.

Paul Haddad:

WWDC could be really awkward.

“Here’s our brand new $5k+ Mac Pro”

“Next year we’re deprecating Intel”

Nick Wingfield:

Ok, reading this report about Apple contemplating dumping Intel chips for its own put me in a nostalgic mood. Here’s a little story about how Steve Jobs operated with the press.

Previously: Apple Rumored to Combine iPhone, iPad, and Mac Apps to Create One User Experience, Microsoft Launches Windows 10 on ARM.

Update (2018-04-03): Alberto Sendra:

I interpret it differently. If you need a high performance laptop/computer with macOS you need to buy one before 2020 No assurances that pro apps are gonna make the transition anytime soon.

Rui Carmo:

I don’t see why this should come as a surprise. The real question is what their developer experience is going to be like, and how accomodating it will turn out to be for those of us who use Macs as primary devices for cross-platform development.

Kirk McElhearn:

Those of us who write about Apple have long opined about the iOSification of macOS, and the ability to allow iPhone and iPad apps to run on the Mac will be a big deal. It might not work; or it might only work for very simple apps. But it will be a game-changer. I don’t expect Apple to fully iOSify the Mac platform, but allowing iOS apps to run on Macs in a special environment makes sense.

Update (2018-04-04): Marco Arment:

A bit concerned over the rumors of big changes to macOS.

Apple hasn’t prioritized macOS quality in years, and it seems that they can barely touch it these days without leaving a trail of sloppy bugs.

I’d love 2005-Apple to revamp macOS. I’m not sure I trust 2018-Apple to do it.

ATPTipster:

Less of a transition, more of a bug multiplier. Maintaining software quality while dramatically expanding the scope of said software is difficult to impossible, especially given the circumstances.

Andy Ihnatko:

I don’t think Apple would drop Intel completely. It’s easier for me to imagine them using custom CPUs for their consumer-grade Macs and sticking with Intel for the high-horsepower Pro desktops and notebooks. At least for starters.

[…]

ARM is such a huge move — and presents such a big opportunity for change — that I would expect it to accompany a whole new historical age for the Mac. Either Apple would do radical (and long-overdue) modern rethink, akin to what Microsoft did with Windows 10…or they would effectively transform MacOS into an enhanced version of iOS, in function if not in name.

Update (2018-04-06): See also: Accidental Tech Podcast.

Speculation and Dread for the Next Transition

0
0

Andy Ihnatko:

Which is why I can easily picture a plan to build ARM-based Macs that’s part of a bigger plan to change the whole character of the Mac. For years, MacOS has looked decidedly frumpy and unloved, and its few significant improvements (such as TouchID) have been iOS’s hand-me-downs. Maybe that’s because Apple has been sitting on some huge and wonderful ideas that’ll boost the Mac into a higher orbit, and they’ve put off rebuilding MacOS until they had a good reason to tear it all down first.

Or…maybe Apple’s longterm goal isn’t to transition MacOS into the next decade (or, hell, even just our present one). Maybe its goal is to transition Mac users to iOS. Apple’s obsessive love for the iPad has been made clear to me by both my observations of the product line and my conversations with people inside the company (present and former). It doesn’t seem ridiculous that Apple might push the Mac much closer to the character of the iPad, with the iPad Pro picking up enough of the Mac’s character and functions that the whole consumer Mac line would become redundant.

Riccardo Mori (tweet):

This rumoured next transition — from Intel-based Macs to ARM-based Macs — is once again for the better, at least on paper. […] But things have changed in the meantime. For one, today Mac OS evidently isn’t the primary focus of the company. Those past transitions were all done to benefit the Mac; the idea was The Mac shall advance. We’re changing and improving things under the bonnet, but the Mac is still the Mac and its identity won’t change. Instead, this theoretical Intel-to-ARM transition doesn’t feel as such. It feels as there are impending changes to the Mac operating system and platform that are clearly influenced by iOS. This makes me uneasy.

Let me tell you a couple of things straight away: One, there is nothing wrong with the Mac platform, except what Apple has been doing it in recent years. Two, since Steve Jobs’s passing, my impression is that Apple has been progressively unable to properly handle their two major platforms, Mac OS and iOS. It’s like they can’t keep a balance of resources, development, and attention between Mac OS and iOS. Instead of envisaging a plan where the two platforms progress in parallel, and flourish by making the most of their respective strengths, what I’ve seen is a clear preference for iOS, and a clear progressive neglect of Mac OS. As a Mac user, this frustrates me.

Previously: Tim Cook Says Users Don’t Want iOS to Merge With macOS, Apple Plans to Use Its Own Chips in Macs From 2020, Replacing Intel.

C Is Not a Low-level Language

0
0

David Chisnall (Hacker News):

In the wake of the recent Meltdown and Spectre vulnerabilities, it's worth spending some time looking at root causes. Both of these vulnerabilities involved processors speculatively executing instructions past some kind of access check and allowing the attacker to observe the results via a side channel. The features that led to these vulnerabilities, along with several others, were added to let C programmers continue to believe they were programming in a low-level language, when this hasn't been the case for decades.

[…]

A modern Intel processor has up to 180 instructions in flight at a time (in stark contrast to a sequential C abstract machine, which expects each operation to complete before the next one begins). A typical heuristic for C code is that there is a branch, on average, every seven instructions. If you wish to keep such a pipeline full from a single thread, then you must guess the targets of the next 25 branches.

[…]

Consider another core part of the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades. A modern processor often has three levels of cache in between registers and main memory, which attempt to hide latency.

[…]

A processor designed purely for speed, not for a compromise between speed and C support, would likely support large numbers of threads, have wide vector units, and have a much simpler memory model. Running C code on such a system would be problematic, so, given the large amount of legacy C code in the world, it would not likely be a commercial success.

Previously: Intel CPU Design Flaw Necessitates Kernel Page Table Isolation.

Intel FPU May Spill Crypto Secrets to Apps

0
0

Chris Williams:

The security shortcoming involves what’s known as lazy FPU state restore. Operating system kernels would only save and restore the floating-point unit (FPU) registers, and other context information, when programs were actually using the math unit.

This, it turned out today, through a security gaffe in Intel’s blueprints related to Spectre-Meltdown Variant 3A, allows a program to obtain scraps of the FPU context of another app. Variant 3A allows applications to read system registers that only privileged code should be allowed to peek at.

The fix is to employ a mechanism called eager FPU state restore, which modern Linux, Windows and other kernels use. These mitigations do not carry a performance hit – in fact, eager state switching can increase performance.

It says that only older Windows and Linux versions are vulnerable—no mention of macOS.

Previously: Intel CPU Design Flaw Necessitates Kernel Page Table Isolation.


Intel and the Danger of Integration

0
0

Ben Thompson:

As Bajarin notes, 7nm for TSMC (or Samsung or Global Foundries) isn’t necessarily better than Intel’s 10nm; chip-labeling isn’t what it used to be. The problem is that Intel’s 10nm process isn’t close to shipping at volume, and the competition’s 7nm processes are. Intel is behind, and its insistence on integration bears a large part of the blame.

[…]

It is perhaps simpler to say that Intel, like Microsoft, has been disrupted. The company’s integrated model resulted in incredible margins for years, and every time there was the possibility of a change in approach Intel’s executives chose to keep those margins. In fact, Intel has followed the script of the disrupted even more than Microsoft: while the decline of the PC finally led to The End of Windows, Intel has spent the last several years propping up its earnings by focusing more and more on the high-end, selling Xeon processors to cloud providers. That approach was certainly good for quarterly earnings, but it meant the company was only deepening the hole it was in with regards to basically everything else. And now, most distressingly of all, the company looks to be on the verge of losing its performance advantage even in high-end applications.

The Menu Bar:

We talk to chip expert Ashraf Eassa of The Motley Fool about how Intel’s chip delays mess with Apple’s roadmap, to what extent Intel is on fire, why Apple is likely moving away from Intel, why Switch may have serious staying power for Nintendo, how Marzipan points to Apple avoiding the Microsoft misstep, speculation about Project Star, a way Apple could get around saying ’No’ to a hybrid, Intel’s ongoing talent hemorrhage, the path for Apple migrating to ARM, a little bit on where AMD stands, and Apple’s bonkers silicon advantage.

Previously: On the Sad State of Macintosh Hardware.

A Brief History of Unreal Mode

0
0

Michal Necasek (via Joe Groff):

For the purposes of this discussion, unreal mode is a variant of the x86 real mode with non-standard segment limits and/or attributes, different from the processor state at reset. To recap, real mode on the 286 and later CPUs has much more in common with protected mode than with the real (and only) mode of the 8086. Notably, undefined opcodes raise exceptions, segment limit overruns cause general protection or stack faults, and (on the 386 and later) 32-bit registers and 32-bit addressing can be used—subject to limit checks.

[…]

As a general-purpose programming technique it is unusable, because it absolutely cannot function in V86 mode. Transitions to V86 mode always force real-mode compatible segment limits and attributes. That means unreal mode cannot be used together with EMM386 or other DOS memory managers utilizing V86 mode. Unreal mode also cannot be used in the DOS boxes of 386 Enhanced Mode Windows 3.x, in the DOS boxes of OS/2 2.x, Windows NT, or Windows 9x. That is an extremely serious drawback.

[…]

On the other hand, when unreal mode can be used, it is very useful. HIMEM.SYS uses unreal mode to speed up extended memory access, and perhaps more importantly, preserve normal interrupt latency. Firmware can and does use unreal mode for accessing memory beyond 1 MB during initialization; it avoids switching between real and protected mode, and in firmware there is no danger of segment limits being reset.

[…]

Unreal mode is almost certainly an accident of history, a side effect of the fact that the initial 386 design had no architected way of switching from protected mode back to real mode. Once the technique started being used, instead of clearly documenting how it works, Intel in its typical fashion documented only certain aspects of it, such that only programmers who already know about unreal mode find traces of it in the official documentation.

Custom ARM Processor for Amazon Web Services

0
0

Tom Krazit:

After years of waiting for someone to design an Arm server processor that could work at scale on the cloud, Amazon Web Services just went ahead and designed its own.

Vice president of infrastructure Peter DeSantis introduced the AWS Graviton Processor Monday night, adding a third chip option for cloud customers alongside instances that use processors from Intel and AMD. The company did not provide a lot of details about the processor itself, but DeSantis said that it was designed for scale-out workloads that benefit from a lot of servers chipping away at a problem.

The new instances will be known as EC2 A1, and they can run applications written for Amazon Linux, Red Hat Enterprise Linux, and Ubuntu.

Chris Williams:

Up until 2015, Amazon and AMD were working together on a 64-bit Arm server-grade processor to deploy in the internet titan’s data centers. However, the project fell apart when, according to one well-placed source today, “AMD failed at meeting all the performance milestones Amazon set out.”

In the end, Amazon went out and bought Arm licensee and system-on-chip designer Annapurna Labs, putting the acquired team to work designing Internet-of-Things gateways and its Nitro chipset, which handles networking and storage tasks for Amazon servers hosting EC2 virtual machines.

Update (2018-12-11): See also: Hacker News.

iMac and MacBook Last Updated 602 Days Ago

Key iOS Chip Architect Departs Apple

0
0

Chris Jenkins:

Gerard Williams III, lead designer of Apple’s custom iOS chips from A7 to A12X, has departed the company, according to CNET. While no indication of a change has been made on his LinkedIn profile, it does offer a glimpse into his design prowess.

[…]

He came to Apple with a splash, as the A7 was Apple’s first 64-bit CPU core. This design arrived on the market over a full year before competitors like Qualcomm and Samsung could respond and largely cemented the technical prowess of the SoC team Apple had created.

If confirmed, his departure would follow the more well-known CPU architect Jim Keller, who was part of Apple’s acquisition of PA Semi. More recently, Apple’s SoC team lost its lead Manu Gulati, whose vacated role was assumed by Williams.

Seems like we were just hearing that all the top people were going the other way.

Apple’s Q2 2019 Results

0
0

Jason Snell:

Apple’s quarterly results are in. The company posted revenue of $58 billion, down 5% from the same quarter a year ago. iPad revenue was up 22% and Services revenue was up 16%, but Mac revenue was down 5% and iPhone revenue was down 17%.

We’ve got lots of charts below, as well as a transcript of CEO Tim Cook and CFO Luca Maestri’s conference call with financial-industry analysts.

John Gruber:

At 20% of the company’s revenue, Services now accounts for more revenue than Mac and iPad combined.

I don’t see how this is good for the quality of the products or, ultimately, for customers. The continual notifications and extra screens to tap through are like the laptop stickers that Steve Jobs hated, and they’re just the smallest example of how the focus on services is shifting the company’s attention and priorities.

Tim Cook:

For our Mac business overall, we faced some processor constraints in the March quarter, leading to a 5 percent revenue decline compared to last year.

This seems like an odd comment, unless it’s just intended to lay more groundwork in Apple’s case for ARM. Are Mac buyers really that worried about processor speeds rather than, say, keyboards? And if slower processors are the problem, isn’t that mostly self-inflicted?

See also: Dave Girouard.

Previously: Apple’s Q4 2018 Results.

Update (2019-05-02): John Gruber (tweet):

I asked an Apple source last fall why it took so long for Apple to release the new MacBook Air. Their one-word answer: “Intel.”

Jeff Baxendale:

I know there’s not going to be a switch given impending ARM Macs, but would have been nice to just have Ryzen Macs instead of complaining about Intel.

They’re a way better deal, nobody buys for “Intel Inside”, and then maybe the integrated GPUs wouldn’t be total garbage 🤷‍♂️

Microarchitectural Data Sampling (MDS) Mitigation

0
0

Ross Mcilroy et al. (via Hacker News):

This paper explores speculative side-channel attacks and their implications for programming languages. These attacks leak information through micro-architectural side-channels which we show are not mere bugs, but in fact lie at the foundation of optimization. […] As a result of our work, we now believe that speculative vulnerabilities on today’s hardware defeat all language-enforced confidentiality with no known comprehensive software mitigations, as we have discovered that untrusted code can construct a universal read gadget to read all memory in the same address space through side-channels. In the face of this reality, we have shifted the security model of the Chrome web browser and V8 to process isolation.

Liam Tung (via Reddit):

Major slowdowns caused by the new Linux 4.20 kernel have been traced to a mitigation for Spectre variant 2 that Linux founder Linus Torvalds now wants restricted.

PierreLebeaupin:

It’s hard to believe it has now been more than one year since the disclosure of Meltdown and Spectre. There was so much frenzy in the first days and weeks that it has perhaps obscured the fact any solutions we currently have are temporary, barely secure, spackle-everywhere stopgap mitigations, and now that the dust has settled on that, I thought I’d look at what researchers and other contributors have come up with in the last year to provide secure processors – without of course requiring all of us to rewrite all our software from scratch.

Apple (via Benjamin Mayo):

Intel has disclosed vulnerabilities called Microarchitectural Data Sampling (MDS) that apply to desktop and notebook computers with Intel CPUs, including all modern Mac computers.

Although there are no known exploits affecting customers at the time of this writing, customers who believe their computer is at heightened risk of attack can use the Terminal app to enable an additional CPU instruction and disable hyper-threading processing technology, which provides full protection from these security issues.

[…]

Testing conducted by Apple in May 2019 showed as much as a 40 percent reduction in performance with tests that include multithreaded workloads and public benchmarks.

John Gruber:

It’s good that there are no known exploits using these techniques, but even if there were, the overwhelming majority of Mac users — almost everyone — would not need to enable this mitigation. These MDS vulnerabilities enable malware on your computer to do bad things. But these vulnerabilities are not ways for malware to get onto your computer.

However, it sounds like the fix is finally a way to work around the hyper-threading bug that can lead to data corruption on my iMac, amongst other Macs.

Previously:


Decoding Intel Chip Names

0
0

Sean Hollister (via Nilay Patel):

Particularly because not all these chips are equal: a Core i7-1060G7, Core i7-1065G7 and Core i7-1068G7 might sound roughly the same, but they’re really not.

[…]

The first two digits are always “10,” and they simply mean you’re looking at a 10th Gen Ice Lake processor with all the benefits that confers, like faster graphics and better battery life when playing HEVC video, but also often a lower base clockspeed than before. If you see a “9” or an “8”, you’re looking at an older Intel processor.

The third digit seems to be how high a chip sits on the totem pole in terms of speed. For instance, a Core i7-1065G7 is clocked 100MHz higher than a Core i5-1035G7, and can boost 200MHz faster for short periods of time.

But the fourth digit is weirdly more important than the third digit, because it tells you the entire class of processor you’re looking at[…]

The Team Behind the 6502

0
0

Team 6502 (via Hacker News):

When it was introduced in 1975 by MOS Technology of Valley Forge, Pennsylvania, the 8-bit microprocessor sold for a fraction of the cost of other microprocessors, causing rapid decreases across the entire computing industry. Featured in such seminal products as the Apple I and II, the Commodore PET, and the BBC Micro, as well as Atari and Nintendo game consoles, the 6502 microprocessor has been the brains inside toys, office machines, and medical devices too numerous to mention. As one of the most widely used microprocessor architectures of all time, the CMOS related form of the 65XX developed by the Western Design Center is still in production today, with an estimated six billion units so far produced.

While the story of Chuck Peddle, the visionary who conceived of the 6502, and that of design team member and founder of the Western Design Center, Bill Mensch, are widely recognized and recorded, the stories of the other MOS Technology engineers and employees who also worked on the 6502 and their contributions are not. This website seeks to change that.

Harry Bawcom:

In the 1970s much of what today is done by computer had to be done by hand. In the case of the 6502, once the design of the chip was completed by the team that worked on the chip’s architecture, that schematic was given to the design layout team. It was our job to create a topological layout from that schematic, a layout of all the transistors which made up the 11 or so glass reticles, also called "masks" that were then used to create the chip.

[…]

Design rule checking (the physical spacing between metal stripes: too close together and they will always short out in manufacture, making the chip non functional) and verifying schematic to layout was all manual. It was done by coloring a plot with colored pencils again and again as coloring a plot guided your eye to notice incorrect spacings or design rule violations.

Sydney Anne Holt:

We were given the logic drawings from Bill Mensch and Ray Hirt and etc, and turned them into the drawing you see in the picture from the Electrical Engineering Times article from 1975.

To do this, we drew them in pieces on big sheets of mylar that fit together like a puzzle. In order to do a careful logic to layout check we taped all the pieces together on the floor and crawled around on it to trace out the lines. The drawings were then digitized into layers so masks could be made from them.

I remember that once, one of the guys took off his shoes and was on the mylar checking when it was discovered his socks were damp and his toes were erasing the drawing as he moved along. Fortunately, it was caught very soon so the rework was minimal.

Apple Suing Former A-series Chip Lead

0
0

Shaun Nichols (Hacker News):

In a complaint filed in the Santa Clara Superior Court, in California, USA, and seen by The Register, the Cupertino goliath claimed Gerard Williams, CEO of semiconductor upstart Nuvia, broke his Apple employment agreement while setting up his new enterprise.

Williams – who oversaw the design of Apple’s custom high-performance mobile Arm-compatible processors for nearly a decade – quit the iGiant in February to head up the newly founded Nuvia.

Apple’s lawsuit alleged Williams hid the fact he was preparing to leave Apple to start his own business while still working at Apple, and drew on his work in steering iPhone processor design to create his new company. Crucially, Tim Cook & Co’s lawyers claimed he tried to lure away staff from his former employer. All of this was, allegedly, in breach of his contract.

Ben Lovejoy:

Williams is fighting the lawsuit, arguing that the alleged ‘breach of contract’ claim is unenforceable and that Apple illegally monitored his text messages.

Presumably it wouldn’t be illegal if the recipients of his messages gave them to Apple. So it sounds like he’s alleging that Apple directly accessed them somehow.

Previously:

Chuck Peddle, RIP

0
0

Mike Mika:

On Dec 15th, we lost Chuck Peddle, the lead designer of the MOS 650x series microprocessor and the Commodore PET. His processor was the heart of the Atari 2600/5200/400/600/800, Apple II, NES, VIC-20, C-64, Kim-1, Master System, Lynx, BBC Micro, arcade games and so much more. RIP

Bill Mensch (Hacker News):

In the Spring of 1974, Chuck asked me to head a semiconductor engineering team to design a microprocessor family of chips that the world knows as the 6502 family of chips. We left Motorola as a team on August 19, 1974 to begin work at MOS Technology.

[…]

The TFC chip was designed using my 65C02 microprocessor with high-speed DMA features for USB FLASH Modules Chuck planned to manufacture sell. The TFC used Chuck’s patented “page-mode” concepts for replacing bad pages with “good” pages within tested “bad” segments. Chuck wrote the Assembly language code for the TFC. Chuck had negotiated a relationship with FLASH memory suppliers to support his “page-mode” business.

[…]

Chuck’s latest work was on Solid State Disc (SSD) drives, used some of the TFC concepts for high speed DMA transfers.

Previously:

Bloomberg: ARM Macs in 2021

0
0

Mark Gurman at al. (tweet, Hacker News, Slashdot, AppleInsider, MacRumors):

The Cupertino, California-based technology giant is working on three of its own Mac processors, known as systems-on-a-chip, based on the A14 processor in the next iPhone. The first of these will be much faster than the processors in the iPhone and iPad, the people said.

Apple is preparing to release at least one Mac with its own chip next year, according to the people. But the initiative to develop multiple chips, codenamed Kalamata, suggests the company will transition more of its Mac lineup away from current supplier Intel Corp.

[…]

The first Mac processors will have eight high-performance cores, codenamed Firestorm, and at least four energy-efficient cores, known internally as Icestorm. Apple is exploring Mac processors with more than 12 cores for further in the future, the people said.

This is another good reason to postpone or scale back macOS 10.16. We want the software to be in as good a shape as possible before a big hardware transition. Apple will have even less time to fix bugs, and developers less time to work around them.

I expect the ARM transition to be accompanied by removal of lots of APIs, so developers will have to contend with that, as well as porting and testing their own code, and dealing with any dependencies that have broken.

Gus Mueller:

I don’t think unifying the chip architecture would make the app ecosystem any more unified. Apple could do this today if they really wanted to, fat binaries (where different cpu architectures are combined in the same application) have been around forever. Major frameworks are already on both architectures, which is the biggest hurtle. I think the problem is more philosophical, or maybe Apple just lacks the will or vision to actually get it done today, if ever.

Previously:

Update (2020-04-24): John Gruber:

The $64,000 question is whether they’re going to have an emulator for running x86 code on ARM Macs.

Skylake QA Drove Apple Away

0
0

Dave James (via Slashdot, MacRumors):

The “bad quality assurance of Skylake” was responsible for Apple finally making the decision to ditch Intel and focus on its own ARM-based processors for high-performance machines. That’s the claim made by outspoken former Intel principal engineer, François Piednoël.

[…]

“The quality assurance of Skylake was more than a problem,” says Piednoël during a casual Xplane chat and stream session. “It was abnormally bad. We were getting way too much citing for little things inside Skylake. Basically our buddies at Apple became the number one filer of problems in the architecture. And that went really, really bad.

“When your customer starts finding almost as much bugs as you found yourself, you’re not leading into the right place.”

Previously:

A Primer on Memory Consistency and Cache Coherence

0
0

Vijay Nagarajan et al. (via Pierre Habouzit and David Goldblatt):

Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems.

This second edition reflects a decade of advancements since the first edition and includes, among other more modest changes, two new chapters: one on consistency and coherence for non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency and coherence.


One More Thing: Apple Silicon Macs

0
0

Apple (MacRumors, Hacker News):

Apple today announced M1, the most powerful chip it has ever created and the first chip designed specifically for the Mac. M1 is optimized for Mac systems in which small size and power efficiency are critically important. As a system on a chip (SoC), M1 combines numerous powerful technologies into a single chip, and features a unified memory architecture for dramatically improved performance and efficiency. M1 is the first personal computer chip built using cutting-edge 5-nanometer process technology and is packed with an astounding 16 billion transistors, the most Apple has ever put into a chip. It features the world’s fastest CPU core in low-power silicon, the world’s best CPU performance per watt, the world’s fastest integrated graphics in a personal computer, and breakthrough machine learning performance with the Apple Neural Engine. As a result, M1 delivers up to 3.5x faster CPU performance, up to 6x faster GPU performance, and up to 15x faster machine learning, all while enabling battery life up to 2x longer than previous-generation Macs. With its profound increase in performance and efficiency, M1 delivers the biggest leap ever for the Mac.

Apple (MacRumors: Air, Mini, Pro, Hacker News, Slashdot):

Apple today introduced a new MacBook Air, 13-inch MacBook Pro, and Mac mini powered by the revolutionary M1, the first in a family of chips designed by Apple specifically for the Mac.

  • This is the second line of Apple M chips.
  • No touchscreens, at least not yet. Nor cellular, Face ID, TestFlight, or AirTags.
  • The Intel MacBook Air is no longer for sale, but you can still get an Intel Mac mini or 13-inch MacBook Pro.
  • What are the performance differences between the three new Macs? Do they have the same clock rate? Is the difference just more sustained performance on the models that have fans?
  • Or, for that matter, how do they compare with Apple’s Intel notebooks and iMacs? Or iPads and iPhones?
  • There does seem to be a GPU difference between the Air and Pro.
  • None of the M1 Macs has more than 2 Thunderbolt ports, whereas the Intel Mac mini and MacBook Pro had 4.
  • The maximum RAM for them all is 16 GB, down from 32 GB on the Intel MacBook Pro.
  • Is the RAM really on the chip? Is that why it’s so expensive ($200 for 8 GB)?
  • Do we actually want more RAM than before, to hold the translated system frameworks?
  • It sounds like the camera hardware is the same, and the improvements are only in software.
  • Does the Touch Bar now run on the main CPU? Does it no longer have a separate OS?
  • There was no announcement about returning or trading in DTKs.

David Smith:

fun fact: retaining and releasing an NSObject takes ~30 nanoseconds on current gen Intel, and ~6.5 nanoseconds on an M1…and ~14 nanoseconds on an M1 emulating an Intel 😇

Rich Siegel:

After the whole “iOS 14 is shipping tomorrow” thing, macOS developers get a whole extra day!

Previously:

The Apple Silicon M1

0
0

Andrei Frumusanu:

The new processor is called the Apple M1, the company’s first SoC designed with Macs in mind. With four large performance cores, four efficiency cores, and an 8-GPU core GPU, it features 16 billion transistors on a 5nm process node. Apple’s is starting a new SoC naming scheme for this new family of processors, but at least on paper it looks a lot like an A14X.

[…]

What really defines Apple’s Firestorm CPU core from other designs in the industry is just the sheer width of the microarchitecture. Featuring an 8-wide decode block, Apple’s Firestorm is by far the current widest commercialized design in the industry.

[…]

A +-630 deep ROB is an immensely huge out-of-order window for Apple’s new core, as it vastly outclasses any other design in the industry.

[…]

Exactly how and why Apple is able to achieve such a grossly disproportionate design compared to all other designers in the industry isn’t exactly clear, but it appears to be a key characteristic of Apple’s design philosophy and method to achieve high ILP (Instruction level-parallelism).

[…]

Apple’s usage of a significantly more advanced microarchitecture that offers significant IPC, enabling high performance at low core clocks, allows for significant power efficiency gains versus the incumbent x86 players.

Robert Graham:

In short, Apple’s advantage is their own core design outpacing Intel’s on every measure, and TMSC being 1.5 generations ahead of Intel on manufacturing process technology. These things matter, not “ARM” or “RISC” instruction set.

Howard Oakley:

GPUs are now being used for a lot more than just driving the display, and their computing potential for specific types of numeric and other processing is in demand. So long as CPUs and GPUs continue to use their own local memory, simply moving data between their memory has become an unwanted overhead. If you’d like to read a more technical account of some of the issues which have brought unified memory to Nvidia GPUs, you’ll enjoy Michael Wolfe’s article on the subject.

Apple:

Learn how developers updated their apps for Apple silicon Macs and began taking advantage of the advanced capabilities of the Apple M1 chip.

Apple:

Discover the advances in Metal performance and capability delivered with the Apple M1 chip on Apple silicon Macs. Apple M1 unites the top-end graphics and compute abilities of discrete GPUs with the features and power efficiency of Apple silicon, creating entirely new opportunities for developers of Metal-based apps and games on macOS. We’ll explore the Metal graphics and compute fundamentals of Apple M1, then take you through four important Metal features to make your Mac apps really shine on Apple silicon: tile shading, memoryless render targets, programmable blending, and sparse texturing.

Previously:

Dissecting the Apple M1 GPU

0
0

Alyssa Rosenzweig (via Hacker News):

Apple’s latest line of Macs includes their in-house “M1” system-on-chip, featuring a custom GPU. This poses a problem for those of us in the Asahi Linux project who wish to run Linux on our devices, as this custom Apple GPU has neither public documentation nor open source drivers.

[…]

The process for decoding the instruction set and command stream of the GPU parallels the same process I used for reverse-engineering Mali GPUs in the Panfrost project, originally pioneered by the Lima, Freedreno, and Nouveau free software driver projects. Typically, for Linux or Android driver reverse-engineering, a small wrap library will be written to inject into a test application via LD_PRELOAD that hooks key system calls like ioctl and mmap in order to analyze user-kernel interactions. Once the “submit command buffer” call is issued, the library can dump all (mapped) shared memory for offline analysis.

Previously:

Intel Problems

0
0

Ben Thompson:

In fact, the x86 business proved far too profitable to take such a radical step, which is the exact sort of “problem” that leads to disruption: yes, Intel avoided Microsoft’s fate, but that also means that the company never felt the financial pain necessary to make such a dramatic transformation of its business at a time when it might have made a difference (and, to be fair, Andy Grove needed the memory crash of 1984 to get the company to fully focus on processors in the first place).

[…]

This is why Intel needs to be split in two. Yes, integrating design and manufacturing was the foundation of Intel’s moat for decades, but that integration has become a strait-jacket for both sides of the business. Intel’s designs are held back by the company’s struggles in manufacturing, while its manufacturing has an incentive problem.

Ian Cutress (Hacker News):

We’re following the state of play with Intel’s new CEO, Pat Gelsinger, very closely. Even as an Intel employee for 30 years, rising to the rank of CTO, then taking 12 years away from the company, his arrival has been met with praise across the spectrum given his background and previous successes. He isn’t even set to take his new role until February 15th, however his return is already causing a stir with Intel’s current R&D teams.

News in the last 24 hours, based on public statements, states that former Intel Senior Fellow Glenn Hinton, who lists being the lead architect of Intel’s Nehalem CPU core in his list of achievements, is coming out of retirement to re-join the company. (The other lead architect of Nehalem are Ronak Singhal and Per Hammerlund - Ronak is still at Intel, working on next-gen processors, while Per has been at Apple for five years.)

See also: Nvidia’s Integration Dreams.

Previously:

Update (2021-01-22): John Gruber:

Gelsinger, speaking in early 2021, knows that Intel fell behind years ago — in an industry where it’s notoriously hard to catch up. He’s taking over a ship that already hit an iceberg and is in need of saving. Sometimes you talk trash about your opponent because you’re an idiot. But other times, you talk a little trash to fire up your own team.

Intel’s M1 Benchmarks

0
0

Joe Rossignol:

Nearly three months after the launch of Apple’s rave-reviewed M1 Macs, Intel has fired back, but there are some asterisks involved.

In a slideshow shared by PCWorld this week, Intel highlighted what PCWorld described as “carefully crafted” benchmarks in an attempt to prove that laptops with the latest 11th Generation Core processors are superior to those with Apple’s custom-designed M1 chip.

Andrew E. Freedman (Hacker News, Slashdot):

Intel claims the 11th-Gen system, an internal whitebox with an Intel Core i7-1185G7 and 16GB of RAM, is 30% faster overall in Chrome and faster in every Office task. This largely goes against what we saw in our 13-inch MacBook Pro with M1 review, where benchmarks showed M1 to be largely on the same level, if not better.

[…]

Intel also claims that the i7-1185G7 is six times faster than M1 on AI-tools from Topaz Labs and Adobe Premiere, Photoshop and Lightroom functions.

[…]

In battery life, Intel switched to an Intel Core i7-1165G7 notebook, the Acer Swift 5, rather than sticking with the Core i7-1185G7 in the whitebook it used for performance testing. It also tested a MacBook Air. They ran Netflix streams and tabs and found the MacBook Air came ahead with a six-minute difference.

Jason Snell:

Inconsistent test platforms, shifting arguments, omitted data, and the not-so-faint whiff of desperation.

Previously:

Apple M1 Microarchitecture Research

0
0

Dougall Johnson (via Hacker News):

This is an early attempt at microarchitecture documentation for the CPU in the Apple M1, inspired by and building on the amazing work of Andreas Abel, Andrei Frumusanu, @Veedrac, Travis Downs, Henry Wong and Agner Fog. This documentation is my best effort, but it is based on black-box reverse engineering, and there are definitely mistakes.

[…]

These numbers mostly come from the M1 buffer size measuring tool. The M1 seems to use something along the lines of a validation buffer, rather than a conventional reorder buffer, which complicates measurements a bit. So these may or may not be accurate.

Previously:

M1racles: M1ssing Register Access Controls Leak EL0 State

0
0

Hector Martin (tweet, Hacker News, Bruce Schneier):

A flaw in the design of the Apple Silicon “M1” chip allows any two applications running under an OS to covertly exchange data between them, without using memory, sockets, files, or any other normal operating system features. This works between processes running as different users and under different privilege levels, creating a covert channel for surreptitious data exchange.

The vulnerability is baked into Apple Silicon chips, and cannot be fixed without a new silicon revision.

[…]

The ARM system register encoded as s3_5_c15_c10_1 is accessible from EL0, and contains two implemented bits that can be read or written (bits 0 and 1). This is a per-cluster register that can be simultaneously accessed by all cores in a cluster.

[…]

Really, nobody’s going to actually find a nefarious use for this flaw in practical circumstances. Besides, there are already a million side channels you can use for cooperative cross-process communication (e.g. cache stuff), on every system. Covert channels can’t leak data from uncooperative apps or systems.


Global Chip Shortage

0
0

Nilay Patel (Decoder):

Since the beginning of the pandemic, the demand for microchips has far exceeded supply, causing problems in every industry that relies on computers.

[…]

My guest today is Dr. Willy Shih. He’s the professor of management practices at Harvard Business School. He’s an expert on chips and semiconductors — he spent years working at companies like IBM and Silicon Graphics. And he’s also an expert in supply chains — how things go from raw materials to finished products in stores. Willy’s the guy that grocery stores and paper companies called in March 2020 when there was a run on toilet paper. If anyone’s going to explain this thing, it’s going to be Willy.

Ian King et al. (via Hacker News):

Building an entry-level factory that produces 50,000 wafers per month costs about $15 billion. Most of this is spent on specialized equipment—a market that exceeded $60 billion in sales for the first time in 2020.

Three companies—Intel, Samsung and TSMC—account for most of this investment. Their factories are more advanced and cost over $20 billion each. This year, TSMC will spend as much as $28 billion on new plants and equipment. Compare that to the U.S. government’s attempt to pass a bill supporting domestic chip production. This legislation would offer just $50 billion over five years.

Once you spend all that money building giant facilities, they become obsolete in five years or less. To avoid losing money, chipmakers must generate $3 billion in profit from each plant. But now only the biggest companies, in particular the top three that combined generated $188 billion in revenue last year, can afford to build multiple plants.

Yang Jie et al. (via John Gruber):

Taiwan Semiconductor Manufacturing Co. plans to increase the prices of its most advanced chips by roughly 10%, while less advanced chips used by customers like auto makers will cost about 20% more, these people said. The higher prices will generally take effect late this year or next year, the people said.

Horace Dediu:

IPhone 13 pricing is same as 12. So much for new pricing due to semiconductor shortages.

Performance of the A15

0
0

Jason Snell:

Here’s a funny thing about Tuesday’s announcement of the A15 Bionic: Apple didn’t compare its performance to the A14. In the past, Apple has compared the power of its iPhones to previous models. But this year, Apple has chosen to proclaim that the A15 in the iPhone 13 Pro has 50 percent better graphics and CPU performance “than the competition.”

Given that Apple has generally been ahead of its smartphone competition in terms of processor power, this suggests that the A15 shows less improvement over the A14 than it does over the Qualcomm processors in leading Android phones. And it makes me wonder if Apple is perhaps trying to soft-pedal a new chip that isn’t much faster than the older model.

Dylan Patel (tweet, via Meek Geek):

The CPU is claimed to be 50% faster than the competition while GPU is claimed to be 30% or 50% faster depending on whether it is 4 cores or 5 cores. They are sticking with a 16 core NPU which is now at 15.8 TOPs vs 11 TOPs for the A14. There is a new video encoder and decoder, we hope it incorporates AV1 support. The new ISP enables better photo and video algorithms. The Pro models have variable refresh rate, so that likely necessitated a new display engine. Lastly, the system cache has doubled to 32MB. This was likely done to feed the GPU and save on power. SemiAnalysis also believes Apple moved to LPDDR5 from LPDDR4X.

[…]

The most important thing to note is that the CPU gains are identical from the A12 to A14 as they are from A12 to A15. The GPU gains are quite impressive with a calculated 38.5% improvement. This is larger than the A13 and A14 improvements combined.

[…]

SemiAnalysis believes that the next generation core was delayed out of 2021 into 2022 due to CPU engineer resource problems. In 2019, Nuvia was founded and later acquired by Qualcomm for $1.4B. Apple’s Chief CPU Architect, Gerard Williams, as well as over a 100 other Apple engineers left to join this firm. More recently, SemiAnalysis broke the news about Rivos Inc, a new high performance RISC V startup which includes many senior Apple engineers. The brain drain continues and impacts will be more apparent as time moves on. As Apple once drained resources out of Intel and others through the industry, the reverse seems to be happening now.

Eric Slivka:

These scores represent a roughly 10% increase in single-core performance and 18% increase in multi-core performance compared to the A14 Bionic in the iPhone 12 lineup.

Jason Snell:

If accurate, this would place the A14 to A15 performance boost in line with recent updates. What makes this a question at all is that Apple hasn’t directly compared the two chips, instead opting to compare the iPhone to “the competition.”

Previously:

Apple M1 Pro and M1 Max

0
0

Apple (video, Hacker News, MacRumors):

The CPU in M1 Pro and M1 Max delivers up to 70 percent faster CPU performance than M1, so tasks like compiling projects in Xcode are faster than ever. The GPU in M1 Pro is up to 2x faster than M1, while M1 Max is up to an astonishing 4x faster than M1, allowing pro users to fly through the most demanding graphics workflows.

[…]

M1 Pro offers up to 200GB/s of memory bandwidth with support for up to 32GB of unified memory. M1 Max delivers up to 400GB/s of memory bandwidth — 2x that of M1 Pro and nearly 6x that of M1 — and support for up to 64GB of unified memory. And while the latest PC laptops top out at 16GB of graphics memory, having this huge amount of memory enables graphics-intensive workflows previously unimaginable on a notebook. The efficient architecture of M1 Pro and M1 Max means they deliver the same level of performance whether MacBook Pro is plugged in or using the battery. M1 Pro and M1 Max also feature enhanced media engines with dedicated ProRes accelerators specifically for pro video processing.

[…]

Utilizing the industry-leading 5-nanometer process technology, M1 Pro packs in 33.7 billion transistors, more than 2x the amount in M1. A new 10-core CPU, including eight high-performance cores and two high-efficiency cores[…]

Scott Perry:

The M1 Max’s DRAM is as fast as Intel’s on-die LLC circa 2016. Between this and the SSD performance (as fast as RAM was about 10 years ago), Apple is making a mockery of memory hierarchies.

Hector Martin:

As for the M1 Pro/Max, reminder that a single P-core can saturate the M1’s memory bandwidth, even significantly downclocked. And the M1 already has a lot of memory bandwidth. All that extra memory bandwidth in the new chips has to make a pretty big difference.

See also: Ken Shirriff.

Previously:

Update (2021-10-19): Andrei Frumusanu:

Today’s reveal of the new generation Apple Silicon has been something we’ve been expecting for over a year now, and I think Apple has managed to not only meet those expectations, but also vastly surpass them. Both the M1 Pro and M1 Max look like incredibly differentiated designs, much different than anything we’ve ever seen in the laptop space. If the M1 was any indication of Apple’s success in their silicon endeavors, then the two new chips should also have no issues in laying incredible foundations for Apple’s Mac products, going far beyond what we’ve seen from any competitor.

Steven Sinofsky:

Apple’s M1 Pro/Max is the second step in a major change in computing. What might be seen as an evolution from iPhone/ARM is really part of an Apple story that began in 1991 with PowerPC.

[…]

When you look at M1 Pro/Max today it is tempting to think of this in terms of performance, but performance per watt AND integrated graphics AND integrated memory AND integrated application processors is innovation in an entirely different direction.

Update (2021-10-29): Andrei Frumusanu (Hacker News):

The M1 Pro and M1 Max change the narrative completely – these designs feel like truly SoCs that have been made with power users in mind, with Apple increasing the performance metrics in all vectors. We expected large performance jumps, but we didn’t expect the some of the monstrous increases that the new chips are able to achieve.

On the CPU side, doubling up on the performance cores is an evident way to increase performance – the competition also does so with some of their designs. How Apple does it differently, is that it not only scaled the CPU cores, but everything surrounding them. It’s not just 4 additional performance cores, it’s a whole new performance cluster with its own L2. On the memory side, Apple has scaled its memory subsystem to never before seen dimensions, and this allows the M1 Pro & Max to achieve performance figures that simply weren’t even considered possible in a laptop chip. The chips here aren’t only able to outclass any competitor laptop design, but also competes against the best desktop systems out there, you’d have to bring out server-class hardware to get ahead of the M1 Max – it’s just generally absurd.

Andy Somerfield (via John Gruber):

The #M1Max is the fastest GPU we have ever measured in the @affinitybyserif Photo benchmark. It outperforms the W6900X - a $6000, 300W desktop part - because it has immense compute performance, immense on-chip bandwidth and immediate transfer of data on and off the GPU (UMA).

Yining Karl Li (tweet, Hacker News):

The wider takeaway here though is that in order to give the M1 Max some real competition, one has to skip laptop chips entirely and reach for not just high end desktop chips, but for server-class workstation hardware to really beat the M1 Max. For workloads that push the CPU to maximum utilization for sustained periods of time, such as production-quality path traced rendering, the M1 Max represents a fundamental shift in what is possible in a laptop form factor.

Engin Kurutepe:

This is interesting: only about 6% improvement form 8 core M1 Pro to 10 core M1 Max when compiling a large Xcode project

Jean-Louis Gassée (Hacker News):

The Intel side of our village has dismissed the M1 Pro and Max as impressive but hardly threatening: “Sure, Apple has a fleeting advantage due to their access to TSMC’s denser 5 nanometer process, but once Intel gets there, x86 chips will outperform Apple Silicon, especially with their access to the vast library of Windows software.”

Some things never change. Intel fans had the same reaction, eight years ago, when Apple introduced its first 64-bit processor, the A7 that powered the iPhone 5.

Usman Pirzada:

Almost all of us expected Intel to win on the single-threaded front because of high clock rates and some serious architectural improvements but what is surprising is that they even beat the Apple M1 Max on the multi-threaded front. The Alder Lake Core i9 12900HK mobility processor gets an astounding 13256 score which is followed by Apple at 12753 points. The Intel 11980HK (stock) is further into the horizon at 9149 points and AMD clocks in at 8217 points. This is a generation over generation increase of almost 45% in roughly the same TDP - although not surprising because even though the ADL-P CPU only has 8 “big cores” the small cores have proven to be quite powerful as well.

Now keep in mind, I have no qualms that Apple is still going to win on a power efficiency metric - they always have since the A11 - but Apple’s reign as the fastest mobility chip “period” seems like it is going to be short-lived (we expect ADL-P to land in early 2022).

Update (2021-11-16): Rene Ritchie:

Tom Boger, Vice President of iPad & Mac Product Marketing and Tim Millet, Vice President of Platform Architecture, join me to talk about what they thought when Apple first decided to switch the Mac to custom silicon, what it was like bringing their low/slow/wide approach to a thermal envelope as big as the new MacBook Pro, how scalable architecture really scales up this much (and more), how they think about transistor budget in an increasing post-big compute core world, gaming on Mac, and which MBPs we’re all rocking!

Update (2021-11-24): Timothy Liu (via Hacker News):

I still had questions, so here I am with some (casual) benchmarks that I hope add some additional perspective into interesting hardware capabilities on the M1 Max SOC, just for fun and out of my curiosity.

Key M1 Mac Engineer Departs Apple for Intel

0
0

Juli Clover (Hacker News):

Apple’s former Director of Mac System Architecture Jeff Wilcox this week announced that he has left Apple to take on a new role at Intel. As noted on LinkedIn (via Tom’s Hardware), Wilcox was part of Apple’s M1 team and he had a key role in the transition from Intel chips to Apple silicon.

Wilcox’s profile says that he “led the transition” for all Macs to Apple silicon, and prior to that, he developed the SoC and system architecture for the T2 coprocessor used in Intel Macs.

Previously:

Update (2022-01-13): Juli Clover:

Microsoft has hired Mike Filippo, a semiconductor designer who formerly worked at Apple as a chip architect, reports Bloomberg. Microsoft is aiming to further expand on chip designs for the servers that power its cloud computing services, and at Microsoft, Filippo will be working on processors for Azure servers.

M1 Icestorm Performance and Asymmetric M1 Pro Core Management

0
0

Howard Oakley:

At its heart, each M1 chip has a total of eight processor cores, all based on Apple’s development of technology licensed from Arm. Four are described as Performance cores, dubbed Firestorm, and four are Efficiency cores, or Icestorm. These primarily differ in their compromise between performance and power consumption, with Firestorm cores performing in the same class as better Intel cores, and Icestorm delivering lower performance with much less power requirement and heat production.

Howard Oakley (Hacker News):

In real-world use, what are the penalties for processes running on Icestorm rather than Firestorm cores? Here I report one initial comparison, of performance when calculating floating-point dot products, a task which you might not consider a good fit for the Icestorm.

Central to this is my previous observation that different Quality-of-Service (QoS) settings for processes determine which cores they are run on. OperationQueue processes given a QoS of 17 or higher are invariably run by macOS 11 and 12 on Firestorm cores (and can load Icestorms too), while those with a QoS of 9 are invariably run only on Icestorm cores. Those might change in the face of extreme loading of either core pool, but when there are few other active processes it appears consistent.

[…]

Maynard Handley previously commented that Icestorm cores use about 10% of the power (net 25% of energy) of Firestorm cores. For SIMD vector arithmetic, at least, they perform extremely well for their economy.

Howard Oakley:

Most scheduled background activities in macOS are now managed by this combination of DAS and CTS, which has proved itself to be superior to fixed time intervals managed by launchd.

[…]

As I noted above, one of the significant factors when scheduling background activities is the QoS set for that activity. The great majority of these have the minimum QoS, so that when they’re scheduled to run on an M1 Mac, they do so on the Efficiency (Icestorm) cores. That’s why Time Machine backups occur at slightly irregular intervals, and when they do, they occupy the Icestorms and leave the Firestorms free for the user.

Howard Oakley:

I’ve been unable to find any way of ensuring that normal commands and scripts run from Terminal are constrained to the Efficiency cores, although I may well be missing a trick here.

[…]

When inside macOS, running from Swift, Objective-C or an equivalent language, there are several opportunities to set the Quality of Service when running command tools.

[…]

For those who want to try out running commands on different combinations of cores, I’ve made a new version of my exploratory utility DispatchRider, which has previously allowed you to explore running commands using NSBackgroundActivityScheduler and the DAS despatching system.

Howard Oakley:

On paper, the major difference between the M1 and M1 Pro/Max CPUs is core count: the original M1 has a total of eight, half of which are E (Efficiency, Icestorm) and half P (Performance, Firestorm) cores. The M1 Pro and Max have two more cores in total, but redistribute their type to give eight P cores and only 2 E cores. It would be easy to conclude that experience with the first design showed that the E cores were only lightly loaded, so fewer were needed, but delivering better performance to the user merited twice the number of P cores. While that may well be true, you also have to look at how the cores are actually used.

[…]

Taken together, these results show that process allocation to cores in the M1 Pro and Max is carefully managed according to QoS (as in the M1) and between the two groups of P cores. This management aims to keep the second group of P cores unloaded as much as possible, and within each group of P cores loads lower-numbered cores more than higher-numbered. This is very different from the even-balancing seen in symmetric cores, and in the M1.

[…]

There are also interesting implications for developers wishing to optimise performance on multiple cores. With the advent of eight P cores in the M1 Pro/Max, it’s tempting to increase the maximum number of processes which can be used outside of an app’s main process. While this may still lead to improved performance on Intel Macs with more than four cores, the core management of these new chips may limit processes to the first block of four cores. Careful testing is required, both under low overall CPU load and when other processes are already loading that first block. Interpreting the results may be tricky.

Previously:

Running Background Apps on Efficiency Cores

0
0

St. Clair Software:

As you can see in the image on the left, App Tamer now displays graphs of P and E core usage as well as overall CPU usage. You’ll get these automatically if you’re running App Tamer on an M1-powered Mac.

[…]

When you click on an app in the process list to change its settings, there’s an additional “Run this app on the CPU’s efficiency cores” checkbox, as you can see in the screenshot below.

I explained the basics of this feature in a previous post. It works like App Tamer’s other CPU-saving capabilities in that it’s applied to an app anytime that app is not frontmost. If you turn on the checkbox for Safari, any time that you leave Safari running in the background while you’re using another app, Safari will be switched to the processor’s E cores. This saves power and leaves the P cores free to handle higher priority tasks.

Previously:

Apple M1 Ultra

0
0

Apple (Hacker News):

Featuring UltraFusion — Apple’s innovative packaging architecture that interconnects the die of two M1 Max chips to create a system on a chip (SoC) with unprecedented levels of performance and capabilities — M1 Ultra delivers breathtaking computing power to the new Mac Studio while maintaining industry-leading performance per watt. The new SoC consists of 114 billion transistors, the most ever in a personal computer chip. M1 Ultra can be configured with up to 128GB of high-bandwidth, low-latency unified memory that can be accessed by the 20-core CPU, 64-core GPU, and 32-core Neural Engine, providing astonishing performance for developers compiling code, artists working in huge 3D environments that were previously impossible to render, and video professionals who can transcode video to ProRes up to 5.6x faster than with a 28-core Mac Pro with Afterburner.

[…]

For the most graphics-intensive needs, like 3D rendering and complex image processing, M1 Ultra has a 64-core GPU — 8x the size of M1 — delivering faster performance than even the highest-end PC GPU available while using 200 fewer watts of power.

Apple:

  • Up to 3.8x faster CPU performance than the fastest 27-inch iMac with 10-core processor.
  • Up to 90 percent faster CPU performance than Mac Pro with 16-core Xeon processor.
  • Up to 60 percent faster CPU performance than 28-core Mac Pro.
  • Up to 4.5x faster graphics performance than the 27-inch iMac, and up to 80 percent faster than the fastest Mac graphics card available today.

Ken Shirriff:

Here are the two dies at the same scale. The M1 Ultra is much, much larger physically [than the ARM1]; I estimate it is 20x47mm. Its transistors are much smaller (5 nm vs 3000 nm) giving it 114 billion transistors instead of 25,000. If built with modern transistors, the ARM1 would be a tiny dot.

Previously:


How macOS Manages M1 CPU Cores

0
0

Howard Oakley:

macOS doesn’t provide direct access to cores, core types, or clusters, at least not in public APIs. Instead, these are normally managed through Grand Central Dispatch using Quality of Service (QoS) settings, which macOS then uses to determine thread management policies.

[…]

macOS itself adopts a strategy where most, if not all, of its background tasks are run at lowest QoS. These include automatic Time Machine backups and Spotlight index maintenance. This also applies to compression and decompression performed by Archive Utility: for example, if you download a copy of Xcode in xip format, decompressing that takes a long time as much of the code is constrained to the E cores, and there’s no way to change that.

[…]

In the original M1 chip, with 4 E cores, QoS 9 threads are run with the core frequency set at about 1000 MHz (1 GHz). What happens in the M1 Pro/Max with its 2 E cores is different: if there’s only one thread, it’s run on the cluster at a frequency of about 1000 MHz, but if there are two or more threads, the frequency is increased to 2064 MHz. This ensures that the E cluster in the M1 Pro/Max delivers at least the performance for background tasks as that in the original M1, at similar power consumption, despite the difference in size of the clusters.

Previously:

Update (2022-05-09): Howard Oakley:

In many cases, these appear to demonstrate that running code exclusively on E cores uses more energy, not less.

[…]

This result occurs because Activity Monitor, currently version 10.14 in macOS 12.3.1, doesn’t know the difference between processors with identical cores running at fixed frequency, and Apple’s M1 chips, with two different types of core and variable frequencies for each cluster of cores. Given that it’s now nearly 18 months since Apple started shipping its first M1 Macs, you might think that a little surprising. It’s even worse that Activity Monitor’s errors are discouraging developers from making better use of the cores in M1 chips.

[…]

Until Apple updates the figures returned by Activity Monitor for M1 chips, confounding by core type and frequency makes it not just useless, but actually misleading for comparing CPU % or energy. If you need to assess those, for example when considering whether to let the user change the QoS of threads in your code, the only reliable tool is powermetrics, which provides details of cluster frequencies and power use, as well as active residency.

Howard Oakley:

I show how you can get accurate estimates of power and energy use, and how the E cores in M1 chips can be far more efficient than the P cores. Today’s compression task required less than a third of the energy when run on the E cores, than on the P cores.

Apple Silicon “Augury” DMP Vulnerability

0
0

Francisco Pires:

A team of researchers with the University of Illinois Urbana-Champaign, Tel Aviv University, and the University of Washington have demonstrated a world-first Data Memory-Dependent Prefetcher (DMP) vulnerability, dubbed “Augury,” that’s exclusive to Apple Silicon. If exploited, the vulnerability could allow attackers to siphon off “at rest” data, meaning the data doesn’t even need to be accessed by the processing cores to be exposed.

Augury takes advantage of Apple Silicon’s DMP feature. This prefetcher aims to improve system performance by being aware of the entire memory content, which allows it to improve system performance by pre-fetching data before it’s needed. Usually, memory access is limited and compartmentalized in order to increase system security, but Apple’s DMP prefetch can overshoot the set of memory pointers, allowing it to access and attempt a prefetch of unrelated memory addresses up to its prefetch depth.

See also:

Previously:

Apple M2

0
0

Apple (MacRumors, Hacker News):

Built using second-generation 5-nanometer technology, M2 takes the industry-leading performance per watt of M1 even further with an 18 percent faster CPU, a 35 percent more powerful GPU, and a 40 percent faster Neural Engine. It also delivers 50 percent more memory bandwidth compared to M1, and up to 24GB of fast unified memory.

[…]

The media engine includes a higher-bandwidth video decoder, supporting 8K H.264 and HEVC video.

Apple’s powerful ProRes video engine enables playback of multiple streams of both 4K and 8K video.

Previously:

Update (2022-06-10): Dylan Patel:

M2, codenamed Staten, is generally based on the same IP blocks as A15, codenamed Ellis. The codenames being based on some of New York’s most well-known islands which should be a hint to how closely related these architectures are. A lot of the disappointment in performance uplift comes from weak gen-on-gen gains given the nearly 2 yearlong gap versus M1.

[…]

We discussed this in the past, but a lot of the slow down stems from Apple losing leagues of amazing engineers to firms such as Nuvia and Rivos.

[…]

The funkiness of Apple’s marketing image does mean there is an error window of about 3% after the die was scaled in size.

[…]

The [P] core itself is 21% larger than in M1, and 7% larger than A15. The big area of gen-on-gen growth is with the shared L2 cache which has gone from 12MB to 16MB compared to both M1 and A15.

[…]

One very interesting change is that the ROB appears smaller in the Avalanche core that is found in A15 and M2 versus the Firestorm core found in M1 and A14.

[…]

The E-Core was the main unit of change from a CPU perspective from the A14 to A15 and that holds true here.

[…]

The combo of minor wafer price increases, larger dies from 118.91mm2 to 155.25mm2, and more expensive memory hurts [costs] a lot.

PACMAN Attack on M1 Processor

0
0

Carly Page (Hacker News, paper):

The attack, appropriately called “Pacman,” works by “guessing” a pointer authentication code (PAC), a cryptographic signature that confirms that an app hasn’t been maliciously altered. This is done using speculative execution — a technique used by modern computer processors to speed up performance by speculatively guessing various lines of computation — to leak PAC verification results, while a hardware side-channel reveals whether or not the guess was correct.

What’s more, since there are only so many possible values for the PAC, the researchers found that it’s possible to try them all to find the right one.

[…]

The researchers — which presented their findings to Apple — noted that the Pacman attack isn’t a “magic bypass” for all security on the M1 chip, and can only take an existing bug that pointer authentication protects against.

Samuel K. Moore:

Other researchers familiar with PACMAN say that how dangerous it really is remains to be seen. However, PACMAN “increases the number of things we have to worry about when designing new security solutions,” says Nael Abu-Ghazaleh, chair of computer engineering at University of California, Riverside, and an expert in architecture security, including speculative execution attacks. Processors makers have been adding new security solutions to their designs besides pointer authentication in recent years. He suspects that now that PACMAN has been revealed, other research will begin to find speculative attacks against these new solutions.

Yan’s group explored some naive solutions to PACMAN, but they tended to increase the processor’s overall vulnerability.

[…]

“People used to think software attacks were standalone and separate from hardware attacks,” says Yan. “We are trying to look at the intersection between the two threat models. Many other mitigation mechanisms exist that are not well studied under this new compounding threat model, so we consider the PACMAN attack as a starting point.”

Joseph Ravichandran:

Our goal is to demonstrate that we can learn the PAC for a kernel pointer from userspace. Just demonstrating that this is even possible is a big step in understanding of how mitigations like pointer authentication can be thought of in the spectre era.

We do not aim to be a zero day, but instead aim to be a way of thinking about attacks/ an attack methodology.

The timer used in the attack does not require a kext (we just use the kext for doing reverse engineering) but the attack itself never uses the kext timer. All of the attack logic lives in userspace.

Provided the attacker finds a suitable PACMAN Gadget in the kernel (and the requisite memory corruption bug), they can conduct our entire attack from userspace with our multithread timer. You are correct that the PACMAN Gadget we demonstrate in the paper does live in a kext we created, however, we believe PACMAN Gadgets are readily available for a determined attacker (our static analysis tool found 55,159 potential spots that could be turned into PACMAN Gadgets inside the 12.2.1 kernel).

BrooksT:

The design flaw is in the ARM v8.3 architecture, and it just happens that the M1 is the only commercial chip on that architecture at this time. When other v8.3 systems ship, they’ll have the same flaw.

Previously:

AMD vs. Intel

0
0

Dan Luu:

Looks like AMD passed Intel in market cap last Friday, after being fairly close for quite a while.

The majority of comments I’ve seen are betting on AMD, but I’d bet, at even odds, ten years from today, the 1-month trailing average market cap of Intel is higher than AMD’s.

[…]

I think Intel will be ok if it can recover to 2010-levels of dysfunction while it’s much larger than AMD in revenue/scale.

Ben Thompson:

While there are a host of reasons why TSMC took the performance crown from Intel over the last five years, a major factor is scale: TSMC was making so many chips that it had the money and motivation to invest in Moore’s Law.

The most important decision was shifting to extreme ultraviolet lithography at a time when Intel thought it was much too expensive and difficult to implement; TSMC, backed by Apple’s commitment to buy the best chips it could make, committed to EUV in 2014, and delivered the first EUV-derived chips in 2019 for the iPhone.

[…]

Time will tell if the CHIPS Act achieves its intended goals; the final version did, as I hoped, explicitly limit investment by recipients in China, which is already leading chip makers to rethink their investments. That this is warping the chip market is, in fact, the point: the structure of technology drives inexorably towards the most economically efficient outcomes, but the ultimate end state will increasingly be a matter of politics.

See also: Dithering.

Apple M2 Pro and M2 Max

0
0

Apple (Hacker News, MacRumors, Reddit):

Apple today announced M2 Pro and M2 Max, two next-generation SoCs (systems on a chip) that take the breakthrough power-efficient performance of Apple silicon to new heights. M2 Pro scales up the architecture of M2 to deliver an up to 12-core CPU and up to 19-core GPU, together with up to 32GB of fast unified memory. M2 Max builds on the capabilities of M2 Pro, including an up to 38-core GPU, double the unified memory bandwidth, and up to 96GB of unified memory. Its industry-leading performance per watt makes it the world’s most powerful and power-efficient chip for a pro laptop. Both chips also feature enhanced custom technologies, including a faster 16-core Neural Engine and Apple’s powerful media engine.

[…]

Built using a second-generation 5-nanometer process technology, M2 Pro consists of 40 billion transistors — nearly 20 percent more than M1 Pro, and double the amount in M2.

[…]

With its powerful CPU, M2 Pro can compile code up to 25 percent faster than M1 Pro, and up to 2.5x faster than MacBook Pro with an Intel Core i9 processor.

Previously:

Update (2023-01-18): Geekerwan (via Hacker News):

We designed our own battery test model and run it against windows laptops. And we also benchmarked them using SPEC CPU, as well as real-life performance test, and more.

Gordon Moore, RIP

0
0

Gordon and Betty Moore Foundation (Hacker News, MacRumors):

By 1950, after transferring to the University of California at Berkeley from San Jose State University, Gordon had earned his bachelor’s degree in chemistry. He and Betty were married that same year at a small church in Santa Clara, and set out together for Pasadena, where he was awarded his Ph.D. in chemistry from the California Institute of Technology in 1954.

After graduating from Caltech, Gordon moved east for a job in research with the Applied Physics Laboratory at Johns Hopkins University. In early 1956, he was recruited west again by William Shockley, the soon-to-be Nobel Laureate who had, with his team at Bell Labs, invented the transistor. By 1957, Shockley’s abrasive management approach and fluid direction for Shockley Semiconductor prompted Gordon and seven of his colleagues to exit the company and form Fairchild Semiconductor.

Intel:

Eleven years later, Moore and Noyce co-founded Intel.

Holcomb B. Noble and Katie Hafner:

Mr. Moore had wanted to be a teacher but could not get a job in education. He later called himself an “accidental entrepreneur” because he became a billionaire as a result of an initial $500 investment in the fledgling microchip business, which turned electronics into one of the world’s largest industries.

And it was he, his colleagues said, who saw the future. In 1965, in what became known as Moore’s Law, he predicted that the number of transistors that could be placed on a silicon chip would double at regular intervals for the foreseeable future, thus increasing the data-processing power of computers exponentially.

Walden Kirsch:

By all accounts, Moore was neither brash nor in-your-face like Grove. Nor was he charismatic and high-energy like Noyce. The “law” that bears his name was not self-proclaimed, but popularized by a Cal Tech professor in the mid-1970s. As one measure of his modesty, Moore once confessed to biographer Leslie Berlin that he was “embarrassed to have it called Moore’s Law for a long time.”

Update (2023-04-04): EE Times (via Om Malik):

In tribute to his visionary mind and work, we are reviving below an interview he gave EE Times after receiving the EE Times Annual Creativity in Electronics (ACE) Award for his lifetime achievement at a ceremony in San Francisco on March 9, 2005. Moore sat down with EE Times editors to discuss the industry’s past, present and future.

Thank you, Gordon Moore, for inspiring generations of engineers and paving the way for the future of the semiconductor industry.


Morris Tanenbaum, RIP

0
0

James R. Hagerty:

Dr. Tanenbaum, a chemist who worked for Bell Telephone Laboratories, the research arm of American Telephone & Telegraph Co., saw a chance to dash back to work to test his latest ideas about how to make better semiconductor devices out of silicon.

He tried a new way of connecting an aluminum wire to a silicon chip. He was thrilled when it worked, providing a way to make highly efficient transistors and other electronic devices, an essential technology for the Information Age.

[…]

Dr. Tanenbaum’s pioneering work in the mid-1950s demonstrated that silicon was a better semiconductor material for transistors than germanium, the early favorite.

[…]

“Bell Laboratories, the world’s premier industrial laboratory, was destroyed [following the 1982 antitrust settlement], a major national and global tragedy,” he wrote later in an unpublished memoir written for his family.

Amanda Davis (Hacker News):

Tanenbaum later developed the first gas-diffused silicon transistor, which could amplify and switch signals above 100 megahertz at a switching speed 10 times that of previous silicon transistors.

Despite Tanenbaum’s early work on silicon transistors, AT&T did not support further research or advancement of the technology.

[…]

Tanenbaum instead worked on other new technologies in the decades that followed. In 1962 he was named assistant director of Bell Labs’ metallurgical department. He led the team there that created the first high-field superconducting magnets, which are now used in MRI machines and other medical imaging technologies. Later he helped develop optical fiber and digital telephone switching.

Apple M2 Ultra

0
0

Apple (MacRumors, Hacker News, Slashdot):

M2 Ultra is built using a second-generation 5-nanometer process and uses Apple’s groundbreaking UltraFusion technology to connect the die of two M2 Max chips, doubling the performance. M2 Ultra consists of 134 billion transistors — 20 billion more than M1 Ultra. Its unified memory architecture supports up to a breakthrough 192GB of memory capacity, which is 50 percent more than M1 Ultra, and features 800GB/s of memory bandwidth — twice that of M2 Max. M2 Ultra features a more powerful CPU that’s 20 percent faster than M1 Ultra, a larger GPU that’s up to 30 percent faster, and a Neural Engine that’s up to 40 percent faster. It also features a media engine with twice the capabilities of M2 Max for blazing ProRes acceleration.

Previously:

Update (2023-06-13): Hassan Mujtaba (via Hacker News):

The CPU managed to post a single-core score of up to 2809 points in single-core and 21,531 points in the multi-core tests. For comparison, the Intel Core i9-13900KS scores 3083 points while AMD’s Ryzen 9 7950X scores 2875 points. In multi-threaded benchmarks, the same chips score 21665 and 19342 points, respectively. So as you can see, the workstation-grade Apple M2 Ultra SoC isn’t faster than the mainstream CPU offerings from Intel and AMD.

[…]

If you compare the chip to something like an AMD Threadripper and Intel Xeon W chip, then those would absolutely crush the M2 Ultra in the multi-threaded tests but the single-threaded lead will be on Apple’s M2 Ultra[…]

[…]

As such, in OpenCL, the M2 Ultra SoC ends up 50% slower than NVIDIA’s RTX 4080 and that’s not even the flagship GPU.

Joe Rossignol:

As expected, these scores confirm that the M2 Ultra chip offers up to 20% faster CPU performance compared to the M1 Ultra chip, as Apple advertised.

JVM Compares Strings Using the pcmpestri x86 Instruction

0
0

Jackson Davis (2016, tweet, Hacker News):

String.compareTo is one of a few methods that is important enough to also get a special hand-rolled assembly version.

[…]

Introduced in SSE4.2, pcmpestri is a member of the pcmpxstrx family of vectorized string comparison instructions. With a control byte to specify options for their complex functionality, they are complicated enough to get their own subsection in the x86 ISR. […] Now that’s really putting the C in CISC!

[…]

If this wasn’t complicated enough for you, have a quick gander at the indexOfimplementations (there are 2, depending on the size of the matching string), which use control byte 0x0d, which does “equal ordered” (aka substring) matching.

It sounds like it only compares the Unicode code points, so that equivalent precomposed and decomposed strings are not considered equal.

pcwalton:

One thing I learned about pcmpxstrx is that it’s surprisingly slow: latency of 10-11 cycles and reciprocal throughput of 3-5 cycles on Haswell according to Agner’s tables, depending on the precise instruction variant. The instructions are also limited in the ALU ports they can use. Since AVX2 has made SIMD on x86 fairly flexible, it can sometimes not be worth using the string comparison instructions if simpler instructions suffice: even a slightly longer sequence of simpler SIMD instructions sometimes beats a single string compare.

Previously:

Apple M3, M3 Pro, and M3 Max

0
0

Apple (Hacker News):

These are the first personal computer chips built using the industry-leading 3-nanometer process technology, allowing more transistors to be packed into a smaller space and improving speed and efficiency.

[…]

The M3 family of chips features a next-generation GPU that represents the biggest leap forward in graphics architecture ever for Apple silicon. The GPU is faster and more efficient, and introduces a new technology called Dynamic Caching, while bringing new rendering features like hardware-accelerated ray tracing and mesh shading to Mac for the first time. Rendering speeds are now up to 2.5x faster than on the M1 family of chips. The CPU performance cores and efficiency cores are 30 percent and 50 percent faster than those in M1, respectively, and the Neural Engine is 60 percent faster than the Neural Engine in the M1 family of chips. And, a new media engine now includes support for AV1 decode, providing more efficient and high-quality video experiences from streaming services.

[…]

Additionally, support for up to 128GB of memory unlocks workflows previously not possible on a laptop, such as AI developers working with even larger transformer models with billions of parameters.

Tim Hardwick:

However, looking at Apple’s own hardware specifications, the M3 Pro system on a chip (SoC) features 150GB/s memory bandwidth, compared to 200GB/s on the earlier M1 Pro and M2 Pro. As for the M3 Max, Apple says it is capable of “up to 400GB/s.”

[…]

Notably, Apple has also changed the core ratios of the higher-tier M3 Pro chip compared to its direct predecessor. The M3 Pro with 12-core CPU has 6 performance cores (versus 8 performance cores on the 12-core M2 Pro) and 6 efficiency cores (versus 4 efficiency cores on the 12-core M2 Pro), while the GPU has 18 cores (versus 19 on the equivalent M2 Pro chip).

[…]

According to Apple, the M3 Neural Engine is capable of 18 TOPS, whereas the A17 Pro Neural Engine is capable of 35 TOPS.

[…]

Taken together, it’s presently unclear what real-world difference these changes make to M3 performance when pitted against Apple’s equivalent precursor chips in various usage scenarios[…]

Phil Dennis-Jordan:

So the M3 Pro is basically a 50% scaled-up M3: unlike the M2 Pro it doesn’t have double the memory channels, “only” half again.

Jeff C.:

It seems that they’re doing a bit more this generation to differentiate between Pro and Max.

Previously, the Pro and Max had the same number of CPU cores, so I never had any interest in the Max once I realized that my work fit into the Pro’s RAM ceiling. Now, the Pro’s CPU core advantage over the base chip has been cut in half. To get double the cores of the M3 you need the M3 Max.

Om Malik:

The new M3 chips are coming at an opportune time — Apple’s rivals, Qualcomm, Nvidia, AMD, and Intel, have been making noises about catching up with Apple. […] Qualcomm recently announced the Snapdragon X, a PC chip that it says is better than the M2 processor. Nvidia, too, is working on its own chip, as is AMD. All three companies are using Arm’s technology. Intel, on the other hand, is moving forward with its own technologies.

[…]

How caching is implemented varies based on the intended use — whether it be for gaming, professional graphics, or data center applications. NVIDIA, for example, employs various forms of cache, including L1/L2 caches and shared memory, which are dynamically managed to optimize performance and efficiency. AMD uses large L3 caches (“Infinity Cache”) to boost bandwidth and reduce latency — an approach beneficial for gaming. Intel’s Xe graphics architecture focuses on smart caching, balancing power efficiency and performance.

[…]

Apple has a substantial opportunity to integrate generative AI into its core platform, mainly because of its chip and hardware-level integration.

Previously:

iLeakage: Browser-Based Timerless Speculative Execution Attacks on Apple Devices

0
0

Jason Kim et al. (Hacker News):

We present iLeakage, a transient execution side channel targeting the Safari web browser present on Macs, iPads and iPhones. iLeakage shows that the Spectre attack is still relevant and exploitable, even after nearly 6 years of effort to mitigate it since its discovery. We show how an attacker can induce Safari to render an arbitrary webpage, subsequently recovering sensitive information present within it using speculative execution. In particular, we demonstrate how Safari allows a malicious webpage to recover secrets from popular high-value targets, such as Gmail inbox content. Finally, we demonstrate the recovery of passwords, in case these are autofilled by credential managers.

[…]

Code running in one web browser tab should be isolated and not be able to infer anything about other tabs that a user has open. However, with iLeakage, malicious JavaScript and WebAssembly can read the content of a target webpage when a target visits and clicks on an attacker's webpage. This content includes personal information, passwords, or credit card information.

[…]

At the time of public release, Apple has implemented a mitigation for iLeakage in Safari. However, this mitigation is not enabled by default, and enabling it is possible only on macOS [in Safari’s Debug menu]. Furthermore, it is marked as unstable.

[…]

We disclosed our results to Apple on September 12, 2022 (408 days before public release).

It’s still possible in Lockdown Mode, but slower.

Dan Goodin:

iLeakage represents several breakthroughs. First is its ability to defeat these defenses with Safari running on A- and M-series chips by exploiting a type confusion vulnerability. Secondly, it's a variant that doesn’t rely on timing but rather on what’s known as a race condition. A third key ingredient is the unique ability of WebKit to consolidate websites from different domains into the same renderer process using the common JavaScript method window.open.

So Chrome and Firefox are not vulnerable, but of course Apple doesn’t allow their browser engines on iOS.

Previously:

Dave Cutler Interview

0
0

Dave Plummer (via Hacker News):

Dave Cutler is a seminal figure in computer science, renowned for his contributions to operating systems. Born in 1942, he played pivotal roles in the development of several OSes, most notably VMS for Digital Equipment Corporation (DEC) and Windows NT for Microsoft. Cutler’s design principles emphasize performance, reliability, and scalability. His work on Windows NT laid the foundation for many subsequent Windows versions, solidifying its place in enterprise and personal computing. A stickler for detail and a rigorous engineer, Cutler’s influence is evident in modern OS design and architecture.

Cutler is quick-witted and has an impressive recall of details. It’s hard to believe he’s 81, except that his stories go back to punched cards and 16-bit minicomputers.

Previously:

Operation Triangulation Details

0
0

Dan Goodin (Hacker News):

Researchers on Wednesday presented intriguing new findings surrounding an attack that over four years backdoored dozens if not thousands of iPhones, many of which belonged to employees of Moscow-based security firm Kaspersky. Chief among the discoveries: the unknown attackers were able to achieve an unprecedented level of access by exploiting a vulnerability in an undocumented hardware feature that few if anyone outside of Apple and chip suppliers such as ARM Holdings knew of.

[…]

The mass backdooring campaign, which according to Russian officials also infected the iPhones of thousands of people working inside diplomatic missions and embassies in Russia, according to Russian government officials, came to light in June. Over a span of at least four years, Kaspersky said, the infections were delivered in iMessage texts that installed malware through a complex exploit chain without requiring the receiver to take any action.

[…]

With that, the devices were infected with full-featured spyware that, among other things, transmitted microphone recordings, photos, geolocation, and other sensitive data to attacker-controlled servers. Although infections didn’t survive a reboot, the unknown attackers kept their campaign alive simply by sending devices a new malicious iMessage text shortly after devices were restarted.

Boris Larin (video, Hacker News):

This presentation was also the first time we had publicly disclosed the details of all exploits and vulnerabilities that were used in the attack. We discover and analyze new exploits and attacks using these on a daily basis, and we have discovered and reported more than thirty in-the-wild zero-days in Adobe, Apple, Google, and Microsoft products, but this is definitely the most sophisticated attack chain we have ever seen.

[…]

Various peripheral devices available in the SoC may provide special hardware registers that can be used by the CPU to operate these devices. For this to work, these hardware registers are mapped to the memory accessible by the CPU and are known as “memory-mapped I/O (MMIO)”.

[…]

I discovered that most of the MMIOs used by the attackers to bypass the hardware-based kernel memory protection do not belong to any MMIO ranges defined in the device tree. The exploit targets Apple A12–A16 Bionic SoCs, targeting unknown MMIO blocks of registers that are located at the following addresses: 0x206040000, 0x206140000, and 0x206150000.

[…]

This is no ordinary vulnerability, and we have many unanswered questions. We do not know how the attackers learned to use this unknown hardware feature or what its original purpose was. Neither do we know if it was developed by Apple or it’s a third-party component like ARM CoreSight.

Bill Toulas:

The four flaws that constitute the highly sophisticated exploit chain and which worked on all iOS versions up to iOS 16.2 are:

  • CVE-2023-41990: A vulnerability in the ADJUST TrueType font instruction allowing remote code execution through a malicious iMessage attachment.
  • CVE-2023-32434: An integer overflow issue in XNU's memory mapping syscalls, granting attackers extensive read/write access to the device's physical memory.
  • CVE-2023-32435: Used in the Safari exploit to execute shellcode as part of the multi-stage attack.
  • CVE-2023-38606: A vulnerability using hardware MMIO registers to bypass the Page Protection Layer (PPL), overriding hardware-based security protections.

Nick Heer:

As you might recall, Russian intelligence officials claimed Apple assisted the NSA to build this malware — something which Apple has denied and, it should be noted, no proof has been provided for Apple’s involvement or the NSA’s. It does not appear there is any new evidence which would implicate Apple. But it is notable that it relied on an Apple-specific TrueType specification, and bypasses previously undisclosed hardware memory protections. To be clear, neither of those things increases the likelihood of Apple’s alleged involvement in my mind. It does show how disused or seemingly irrelevant functions remain vulnerable and can be used by sophisticated and likely state-affiliated attackers.

Previously:

Update (2024-01-05): See also: Bruce Schneier.






Latest Images