More Roles Coming soon

We have new roles being added daily. In the meantime, if you see a way to contribute to building high-performance CPUs and work with world-class teams, please send us your resume.
careers@nuvacore.ai

Workload Analysis and Tracing (Lead & ICs)

Full-time · US / Canada / Europe / India or Hybrid · ICs / Lead
Nuvacore is building ground-up CPU silicon for next-generation compute workloads. As our Workload Analysis and Tracing Lead, you are the bridge between real-world software and the hardware decisions that define our microarchitecture — characterizing workloads, collecting and analyzing traces, and translating findings into actionable architectural insights that directly shape the design of future Nuvacore CPUs.

THE ROLE

  • Own workload characterization methodology end-to-end — selecting, porting, and instrumenting industry standard benchmarks like SPEC CPU, GeekBench, and representative real-world workloads (client, server, and emerging compute) across simulation, emulation, and silicon environments.
  • Design and implement instruction tracing infrastructure using binary instrumentation and emulation frameworks (DynamoRIO, QEMU) to generate traces consumed by the CPU performance simulator.
  • Own workload characterization using PMU-based tools (Linux perf, VTune) — collecting hardware counter data to identify and quantify microarchitectural bottlenecks across representative workloads.
  • Investigate performance bottlenecks at the CPU pipeline level — front-end pressure, branch misprediction, cache miss behavior, memory-level parallelism — using PMU data collected on hardware and performance simulation runs, and translate findings into concrete microarchitectural recommendations.
  • Work with software teams to understand and optimize the Linux stack behavior on Nuvacore silicon — OS scheduling, kernel paths, libraries and frameworks, compiler-generated code — and construct targeted microbenchmarks to isolate and reproduce bottlenecks.
  • Build, maintain, and extend analysis tooling and automation pipelines; ability to navigate and modify large, complex codebases (compilers, OS, simulators) is essential.
  • Mentor engineers across the performance team; provide technical leadership on tracing methodology, workload selection, and analysis infrastructure.

WHAT YOU'LL OWN

  • Workload library: End-to-end characterization suite — benchmarks, real-world workloads, and microbenchmarks — running across sim, emulation, and silicon.
  • Tracing infrastructure: Instruction trace pipeline (DynamoRIO, QEMU) feeding the CPU performance simulator; PMU characterization via Linux perf and VTune.
  • Bottleneck analysis: Pipeline-level root-cause methodology using PMU data and simulation runs: front-end, execution, memory hierarchy — feeding directly into arch decisions.
    SW & compiler insights: Code-level analysis of compiler output, Linux stack, libraries and frameworks to identify optimization opportunities for Nuvacore designs.

REQUIREMENTS — MUST HAVE

  • MS in Computer Architecture, Computer Engineering, Computer Science, or related field (PhD preferred).
  • 20+ years (Lead) or 8+ years (IC) in CPU performance analysis, workload characterization, or microarchitecture engineering.
  • Deep, hands-on experience with instruction tracing using binary instrumentation and emulation frameworks (DynamoRIO, QEMU) to generate traces for consumption by CPU performance simulators.
  • Deep, hands-on experience with workload characterization using PMU-based tools (Linux perf, VTune) — top-down microarchitecture analysis (TMA), hardware counter collection, and bottleneck quantification.
  • Strong understanding of the CPU pipeline: front-end (fetch, decode, branch prediction), out-of-order execution, and memory hierarchy (caches, TLBs, prefetchers).
  • Solid understanding of the Linux software stack — OS internals, kernel paths, scheduling, libraries and frameworks — and its interaction with CPU microarchitecture.
  • Strong scripting skills in Python and Perl; ability to navigate, instrument, and modify large codebases (compilers, simulators, OS).
  • Proven ability to construct microbenchmarks to isolate, reproduce, and root-cause microarchitectural performance bottlenecks.
  • Hands-on experience with industry standard benchmarks such as SPEC CPU, GeekBench, Speedometer, Cinebench, SPEC JBB, and DCperf.
  • Experience in RISC-V or ARM64 or x86 ISA; ability to read and analyze compiler-generated assembly.

REQUIREMENTS — nice to HAVE

  • Experience integrating workload traces into cycle-accurate CPU performance simulators.
  • Familiarity with compiler optimization techniques (vectorization, inlining, loop transformations) and their microarchitectural impact.
  • Experience developing data visualization or analysis tooling to communicate performance insights to cross-functional teams.
  • Prior principal-level role at a CPU or SoC design organization.


Appy Now

other positions

CPU Software Validation Engineering Lead

More Info More Info 

CPU Telemetry & Observability Development Lead

More Info More Info 

CPU Firmware Development Lead

More Info More Info 

CPU Operating System Development Lead

More Info More Info 

Head of Software Engineering – CPU & Platform Software

More Info More Info 

CPU Micro-Architect / RTL (Lead & IC Engineers)

More Info More Info 

Competitive Power Performance (Lead & ICs)

More Info More Info 

SoC Performance Modeling (Lead & ICs)

More Info More Info 

CPU Performance Modeling (Lead & ICs)

More Info More Info 

CPU Design Verification (Lead & IC Engineers)

More Info More Info