NextSilicon reveals new processor chip in challenge to Intel, AMD

149 points | by simojo 5 days ago

53 comments

cpard a day ago
Servethehome[1] does a bit of a better job describing what maverick-2 is and why it makes sense.
[1]https://www.servethehome.com/nextsilicon-maverick-2-brings-d...
[-]
- phkahler a day ago
  Thats a fairly specialized chip and requires a bunch of custom software. The only way it can run apps unmodified is if the math libraries have been customized for this chip. If the performance is there, people will buy it.
  For a minute I thought maybe it was Risc-V with a big vector unit, but its way different from that.
  [-]
  - ac29 a day ago
    The article says they are also developing a RISCV CPU
  - stogot a day ago
    The quote at the end of the posted Reuters article (not the one you’re responding to) says that it doesn’t require extensive code modifications. So is the “custom software” is standard for the target customers of nextsilicon?
    [-]
    - jll29 a day ago
      Companies often downplay the amount of software modifications necessary to benefit from their hardware platform's strengths because quite often, platforms that cannot run software out of the box lose out compared to those that can.
      By the time special chips were completed and mature, the developers of "mainstream" CPUs had typically caught up speedwise in the past, which is why we do not see any "transputers" (e.g. Inmos T800), LISP machines (Symbolics XL1200, TI Explorer II), or other odd architectures like the Connection Machine CM-2 around anymore.
      For example, when Richard Feynman was hired to work on the Connection Machine, he had to write a parallel version of BASIC first before he could write any programs for the computer they were selling: https://longnow.org/ideas/richard-feynman-and-the-connection...
      This may also explain failures like Bristol-based CPU startup Graphcore, which was acquired by Softbank, but for less money than the investors had put in: https://sifted.eu/articles/graphcore-cofounder-exits-company...
      [-]
      - lukeh 18 hours ago
        XMOS (spiritual successor to Inmos) is still kicking around, it’s not without its challenges though, for the reasons you mention.
    - phkahler 2 hours ago
      >> says that it doesn’t require extensive code modifications
      If they provide a compiler port and update things like BLAS to support their hardware then higher level applications should not require much/any code modification.
    - c0balt 4 hours ago
      It's a bit more complicated, you need to use their compiler (LVVM fork with clang+fortran). This in itself is not that special as most accelerators (ICC, nvcc, aoc) already require this.
      Modifications are likely on the level of: Does this clang support my required c++ version? Actual work is only required when you want to bring something else, like Rust (AFAIK not supported).
      However, to analyze the efficiency of the code and how it is interpreted by the card you need their special toolchain. Debugging also becomes less convenient.
- maratc 21 hours ago
  I've also found their "Technology Launch" video[1] that goes somewhat deeper into the details (they also have code examples.)
  [1] https://www.youtube.com/watch?v=krpunC3itSM
- klooney a day ago
  They've got a "Mill Core" in there- is the design related to the Mill Computing design?
  [-]
  - damageboy a day ago
    Yeah, it's an unfortunate overlap. The Mill-Core in NextSilicon terminology is the software defined "configuration" of the chip so to speak that represents swaths of the application that are deemed worthy of acceleration as expressed on the custom HW.
    So really the Mill-Core is in a way the expression of the customer's code. really.
  - jecel a day ago
    They are completely different designs, but the name is inspired by the same source: the Mill component in Charles Babbage's Analytical Engine.
dlcarrier a day ago
https://archive.is/6j2p4
I can't access the page directly, because my browser doesn't leak enough identifying information to convince Reuters I'm not a bot, but an actual bot is perfectly capable of accessing the page.
[-]
- yuumei a day ago
  Same but I can’t access archive.is either because of the VPN
- mrbluecoat a day ago
  Odd that doesn't load for me but https://archive.ph/6j2p4 does
  [-]
  - erinnh a day ago
    Archive.is is broken if you use cloudflare dns.
    [-]
    - ‍ a day ago
      [deleted]
wood_spirit a day ago
The other company I can think of focusing on F64 is Fujitsu with its A64FX processor. This is an ARM64 with really meaty SIMD to get 3TFLOP of FP64.
I guess it it hard to compare chip for chip but the question is, if you are building a supercomputer (and we ignore pressure to buy sovereign) then which is better bang for the buck on representative workloads?
[-]
- wmf a day ago
  If Fujitsu only releases one processor every 8 years they're going to be behind for most of the time.
  [-]
  - Neywiny 3 hours ago
    All processors are inherently behind. First research comes out or standards are made, then much time later silicon is fabbed. For example, pcie gen 6 was ratified years ago, but there's nothing I've seen that uses it. Maybe you could argue that their silicon is behind others' but it's all about what their market is and what their customers are demanding.
shrubble a day ago
Curious if the architecture is similar to what is called “systolic” as in the Anton series of supercomputers: https://en.wikipedia.org/wiki/Anton_(computer)
[-]
- fentonc a day ago
  I was an architect on the Anton 2 and 3 machines - the systolic arrays that computed pairwise interactions were a significant component of the chips, but there were also an enormous number of fairly normal looking general-purpose (32-bit / 4-way SIMD) processor cores that we just programmed with C++.
- le-mark a day ago
  I spent a lot of time on systolic arrays to compute crypto currency POW (Blake 2 specifically). It’s an interesting problem and I learned a lot but made no progress. I’ve often wondered if anyone has done the same?
  [-]
  - imtringued 21 hours ago
    You should check out AMD's NPU architecture.
- damageboy a day ago
  Not really. I work for NextSilicon. It's a data-flow oriented design. We will eventually have more details available that gradually explain this.
dajonker a day ago
Even if the hardware is really good, the software should be even better if they want to succeed.
Support for operating systems, compilers, programming languages, etc.
This is why a Raspberry Pi is still so popular even though there are a lot of cheaper alternatives with theoretically better performance. The software support is often just not as good.
[-]
- wood_spirit a day ago
  Their customers are building supercomputers?
  [-]
  - stonogo 6 hours ago
    If you want your customers to spend supercomputing money, you need to have a way for those customers to explore and learn to leverage your systems without committing a massive spend.
    ARM, x86, and CUDA-capable stuff is available off the shelf at Best Buy. This means researchers don't need massive grants or tremendous corporate investment to build proofs of concepts, and it means they can develop in their offices software that can run on bigger iron.
    IBM's POWER series is an example of what happens when you don't have this. Minimum spend for the entry-level hardware is orders of magnitude higher than the competition, which means, practically speaking, you're all-in or not at all.
    CUDA is also a good example of bringing your product to the users. AMD spent years locking out ROCm behind weird market-segmentation games, and even today if you look at the 'supported' list in the ROCm documentation it only shows a handful of ultra-recent cards. CUDA, meanwhile, happily ran on your ten-year-old laptop, even if it didn't run great.
    People need to be able to discover what makes your hardware worth buying.
  - daveguy a day ago
    The implication wasn't to use the raspberry pi toolchain. Just that toolchains are required and are a critical part of developing for new hardware. The Intel/AMD toolchain they will be competing with is even more mature than rpi. And toolchain availability and ease of use makes a huge difference whether you are developing for supercomputers or embedded systems. From the article:
    "It uses technology called RISC-V, an open computing standard that competes with Arm Ltd and is increasingly being used by chip giants such as Nvidia and Broadcom."
    So the fact that rpi tooling is better than the imitators and it has maintained a significant market share lead is relevant. Market share isn't just about performance and price. It's also about ease of use and network effects that come with popularity.
  - ‍ a day ago
    [deleted]
mempko 40 minutes ago
I'm personally boycotting Israeli companies for obvious reasons.
gdiamos a day ago
I find it helpful to read a saxpy and GEMM kernel for a new accelerator like this - do they have an example?
shusaku 2 days ago
If there really is enough market demand for this kind of processor, it seems like someone like NEC who still makes vector processors would be better poised than a startup rolling RISC-V
[-]
- damageboy a day ago
  I work in NS. The riscv was the "one more thing" aspect of the "reveal".
  The main product/architecture discussed has nothing to do with vector processors or riscv.
  It's a new, fundamentally different data-flow processor.
  Hopefully we will improve in explaining what we do and why people may want to care.
  [-]
  - joha4270 a day ago
    So, a Systolic Array[1] spiced up with a pinch of control flow and a side of compiler cleverness? At least that's the impression I get from the servethehome article linked upthead. I wasn't able to find non-marketing better-than-sliced-bread technical details from 3 minutes of poking at your website.
    [1]: https://en.wikipedia.org/wiki/Systolic_array
    [-]
    - damageboy 21 hours ago
      I can see why systolic arrays come to mind, but this is different. While there are indeed many ALUs connected to each other in a systolic array and in a data-flow chip, data-flow is usually more flexible (at a cost of complexity) and the ALUs can be thought of as residing on some shared fabric.
      Systolic arrays often (always?) have a predefined communication pattern and are often used in problems where data that passes through them is also retained in some shape or form.
      For NextSilicon, the ALUs are reconfigured and rewired to express the application (or parts of) on the parallel data-flow acclerator.
    - CheeseFromLidl a day ago
      Are the GreenArray chips also systolic arrays?
      [-]
      - ripe a day ago
        My understanding is no, if I understand what people mean by systolic arrays.
        GreenArray processors are complete computers with their own memory and running their own software. The GA144 chip has 144 independently programmable computers with 64 words of memory each. You program each of them, including external I/O and routing between them, and then you run the chip as a cluster of computers.
        [1] https://greenarraychips.com
        [-]
        rkagerer a day ago
        Reminds me a bit of the Parallax Propeller chip.
  - slwvx a day ago
    Text on the front page of the NS website* leads me to think you have a fancy compiler: "Intelligent software-defined hardware acceleration". Sounds like Cerebras to my non-expert ears.
    * https://www.nextsilicon.com
    [-]
    - damageboy a day ago
      No real overlap with Cerebras. Have tons of respect for what they do and achieve, but unrelated arch / approach / target-customers.
- pezezin a day ago
  NEC doesn't really make vector processors anymore. My company installed a new supercomputer built by NEC, and the hardware itself is actually Gigabyte servers running AMD Instinct MI300A, with NEC providing the installation, support, and other services.
  https://www.nec.com/en/press/202411/global_20241113_02.html
jandrewrogers 11 hours ago
I have designed software for a lot of exotic compute silicon, including systems that could be described similar to this one. My useless superpower is that I am good at designing excellent data structures and algorithms for almost any plausible computing architecture.
From a cursory read-through, it isn’t clear where the high-leverage point is in this silicon. What is the thing, at a fundamental level, that it does better than any other silicon? It seems pretty vague. I’m not saying it doesn’t have one, just that it isn’t obvious from the media slop.
What’s the specific workload where I can abuse any other silicon at the same task if I write the software to fit the silicon?
yyyk a day ago
Sounds like an idea that would really benefit from a JIT-like approach to basically every software.
[-]
- damageboy a day ago
  You can indeed and should assume there is a heavy JIT component to it. At the same time, it is important to note that this is geared for already highly parallel code.
  In other words, while the JIT can be applied to all code in principle, the nature of accelerated HW is that it makes sense where embarrassingly parallel workloads are around.
  Having said that, NextSilicon != GPU, so different approach to acceleration of said parallel code.
znpy a day ago
I definitely expect this to be a big hit.
In a way, this is not new, it’s pretty much what annapurna did: they took ARM and got serious with it, creating the first high performance arm cpus. Then they got acqui-hired by amazon and the rest is history ;)
‍ a day ago
[deleted]
zawaideh a day ago
I don’t want my electronics to contribute to genocide and apartheid and possibly the next pager exploding terror attack. No thanks.
[-]
- danielxt a day ago
  It's not yours, don't have to buy it
- ‍ a day ago
  [deleted]
- AtlasBarfed a day ago
  I'd be fascinated to know who your "good guys" list is.
- flyinglizard a day ago
  Stop using Apple, or Google, or Amazon, or Intel, or Broadcom, or Nvidia then. All have vast hardware development activities in that one country you don't like.
- jbm 19 hours ago
  How dare you have a moral objection to buying from a state accused of genocide. Please stick to completely organic complaints about comedy festivals and soccer tournaments.