After the first push to github, there were bug fixes (thankfully none too embarrassing) and pull requests; it all seemed to be going well. The start of support for AVX2 landed in ispc in December 2011; it looks like it was enabled in January 2012, but support for AVX2’s gather and FMA didn’t land until that summer. (I think that may have been waiting for LLVM support for those, but am not sure.)
In the summer of 2012, Jean-Luc Duprat started work on ispc support for Knight’s Corner (KNC), a HPC-focused architecture based on Larrabee that was the first product in the Xeon Phi series. Jean-Luc, a former graphics person who well understood SPMD, had become a KNC architect and wanted to have the ispc functionality there. Lacking an LLVM backend for KNC, he implemented a clever approach based on using LLVM’s C++ backend to emit C++ intrinsics code. Given the right header file, that could be compiled the rest of the way to assembly. A hack, but in the best kind of way.1
C++ and writing it up
Bill Mark got interested in working through the details of what it would take to propose an addition to the C++ standard for SPMD computation; he’s an incredible system designer and was great at thinking deeply about the details. Over many months, we had many long conversations about the language design and its relation to C++; in the end, he came up with a fairly comprehensive design for C++ extensions which he called “Sierra”. Figuring out the right design for pointers in ispc came out of those discussions; it turned out to be a bit subtle.
An intern implemented a prototype of those ideas in clang, with good initial results; clang’s clean design made that relatively straightforward. It was really neat to see things like lambdas and templates just working with SPMD on SIMD code. Many of the ideas from Bill’s design later appeared in this paper.
Bill and I wrote a paper about ispc in 2012. I think it well captures the system’s design and implementation considerations and it has thorough discussion of previous SPMD languages that turned out to have a lot in common with ispc. We published it at InPar, a new conference on parallel computation that year.
InPar was co-located with NVIDIA’s GTC, which in turn meant there was a heavy GPU focus. And by “heavy GPU focus”, I mean our paper was the only one about CPUs. Yet, with strong audience support, we won the best paper award. Our prize was a top of the line NVIDIA GPU.
Talking to the academics
Among the ways Geoff Lowney was a great help was in getting me set up to give a few external talks on ispc to academic researchers. One of these led to a two-day visit to Illinois to give a talk at UIUC.
I spent the morning of the first day with a bunch of people in Intel’s Champaign-Urbana office, which was fantastic—smart, open-minded and interesting folks. I then got to go have lunch with David Kuck, which was great, too. It turns out that he knows a thing or two about parallel programming.
There was one slight hitch: apparently there was an issue with the chicken in the chicken salad at lunch; food poisoning ensued, and I spent the rest of the afternoon and evening in my hotel room, not in a good state, and overall freaking out about how things would go at the university the next day. Successfully making it through hour plus standing in front of an audience while also speaking cogently seemed quite iffy.
Even when not ill, I always worried about talking to compiler researchers about ispc; compilers is not my area and I worried about my incomplete knowledge of the previous work. I imagined my explaining the idea to a professor and then their saying “Oh, that’s the Hazenburger transformation, first described in 1975. My undergrad compiler class implemented it for an assignment last week. Is there anything else to what you’ve done?”
Um, no—that’s it. (By now I’ve gotten pretty comfortable that there isn’t a Hazenburger transformation after all.)
I’d been a little extra nervous about the UIUC talk because Vikram Adve was on the faculty there and would be in attendance. Not only was he a renowned compiler researcher, he’d been Chris Lattner’s Ph.D. adviser; LLVM got off the ground at UIUC. So there was even more potential for public embarrassment in my imagined worst case, now with the added worry about whether I’d be fully recovered from the food poisoning. Right before the talk, I scouted out the nearest bathroom, just so I’d know where to run in an emergency.
To my relief, the talk went well. Vikram was really nice, and we had a nice chat afterward; he seemed to find the ideas intriguing. The talk was recorded, but link seems to be broken. It’s probably just as well; I can avoid the awkwardness of watching myself on video. The slides are still online; they give a flavor of where things stood at that point and what the general message was.
A few weeks later, giving the talk at a parallel computing lab at another university didn’t go as well. It was a bad omen when the faculty member who was supposed to introduce me didn’t show up until 20 minutes after the talk was support to start. After standing around awkwardly for 10 minutes waiting for someone to kick this thing off, I finally just introduced myself and started the talk.
During the Q&A after the talk, one of the grad students was insistent that a 180x speedup on a 40 core machine that I reported in my results was purely due to multi-threading and how could I be sure that SIMD had anything to do with it? Also, it turns out that there’s not a single interesting workload that isn’t massively-parallel and runs well on GPUs, and therefore there wasn’t anything interesting about making things run fast on CPUs.
It was a bit of a relief when the faculty member who had invited me to give the talk told me that he hadn’t gotten around to setting up any meetings with the researchers in the lab after the talk, as had been part of the original invitation.
It’s funny how it all ended.
For a long time, I’d strenuously avoided building a group to work on ispc; plenty of other people engaged, pitched in, and made critical contributions along the way—Tim Foley, Bill Mark, Jean-Luc, and many others. They were all in different organizations, contributing on their own volition, to the extent of the time they could and wanted to make available.
Not trying to formalize things beyond that was a defensive maneuver. An organized group of people working on ispc would have presented a better target for attack: if I got headcount and hired people to build up a group devoted to ispc, we might work productively for a while. In time, though, the jerks would likely apply their well-honed maneuver of persuading management that those people could be better used working on something else that was more important. If successful, then poof, everyone’s sent away to join another group and the project disintegrates—their actual goal.
With it being just me, there wasn’t much of a target.
In the Fall of 2012, I went ahead and asked Geoff Lowney for just one person’s headcount to help me with ispc. Not too much bigger a target, and it was a good time to start work on serious support for AVX-512; there was plenty to do on that front. He readily went off to make it happen. A few days later, as he told me it was no problem, I felt… terror.
Especially after ispc was open sourced, I’d been able to be fairly carefree: the compiler was out there in the world, it was working well, and people liked it. I could keep on working on it basically on a day by day basis. If things got weird at Intel—politics, some bad reorg, whatever—I knew I could just leave without leaving behind much unfinished business. I’d never planned to stay at Intel for the rest of my career, so I figured I’d stick around as long as it was more fun than not, and leave when the right time came.
But bringing someone on to join the project? Then I’d be responsible for them, having to do my part to shield them from the politics. Worse, I’d no longer have the option to leave Intel anytime soon—it wouldn’t have been fair to that person, especially since most likely they’d be reorged away into something else if I left. I realized that adding someone to the effort was effectively signing myself on to stay for at least another year or two.
Given all of the previous ups and downs, I wasn’t ready to commit to that. Thinking about it more, it seemed like it was probably a good time to move on; ispc was in good shape and nothing major was missing. Continuing to turn the crank didn’t have a lot of appeal.
So I quit, for real that time. Geoff was a bit surprised when I explained why—that his giving me the headcount I had asked for in the first place was what led me to see it was the right time to leave—but he was impressively cool about it. My last commit with an Intel email address was September 14, 2012.
Next up, some big systems written in ispc, design retrospectives, and a splash of ARM-based excitement.
Related, Ingo Wald wrote a SPMD on SIMD language prototype, IVL, which converted directly from the AST to C++ intrinsics code. ↩