More Roles Coming soon

We have new roles being added daily. In the meantime, if you see a way to contribute to building high-performance CPUs and work with world-class teams, please send us your resume.
careers@nuvacore.ai

Workload Analysis and Tracing (Lead & ICs)

Full-time · US / Canada / Europe / India or Hybrid · ICs / Lead
Nuvacore is building ground-up CPU silicon for next-generation compute workloads. As our Workload Analysis and Tracing Lead, you are the bridge between real-world software and the hardware decisions that define our microarchitecture — characterizing workloads, collecting and analyzing traces, and translating findings into actionable architectural insights that directly shape the design of future Nuvacore CPUs.

THE ROLE

  • Own workload characterization methodology end-to-end — selecting, porting, and instrumenting industry standard benchmarks like SPEC CPU, GeekBench, and representative real-world workloads (client, server, and emerging compute) across simulation, emulation, and silicon environments.
  • Design and implement instruction tracing infrastructure using binary instrumentation and emulation frameworks (DynamoRIO, QEMU) to generate traces consumed by the CPU performance simulator.
  • Own workload characterization using PMU-based tools (Linux perf, VTune) — collecting hardware counter data to identify and quantify microarchitectural bottlenecks across representative workloads.
  • Investigate performance bottlenecks at the CPU pipeline level — front-end pressure, branch misprediction, cache miss behavior, memory-level parallelism — using PMU data collected on hardware and performance simulation runs, and translate findings into concrete microarchitectural recommendations.
  • Work with software teams to understand and optimize the Linux stack behavior on Nuvacore silicon — OS scheduling, kernel paths, libraries and frameworks, compiler-generated code — and construct targeted microbenchmarks to isolate and reproduce bottlenecks.
  • Build, maintain, and extend analysis tooling and automation pipelines; ability to navigate and modify large, complex codebases (compilers, OS, simulators) is essential.
  • Mentor engineers across the performance team; provide technical leadership on tracing methodology, workload selection, and analysis infrastructure.

WHAT YOU'LL OWN

  • Workload library: End-to-end characterization suite — benchmarks, real-world workloads, and microbenchmarks — running across sim, emulation, and silicon.
  • Tracing infrastructure: Instruction trace pipeline (DynamoRIO, QEMU) feeding the CPU performance simulator; PMU characterization via Linux perf and VTune.
  • Bottleneck analysis: Pipeline-level root-cause methodology using PMU data and simulation runs: front-end, execution, memory hierarchy — feeding directly into arch decisions.
    SW & compiler insights: Code-level analysis of compiler output, Linux stack, libraries and frameworks to identify optimization opportunities for Nuvacore designs.

REQUIREMENTS — MUST HAVE

  • MS in Computer Architecture, Computer Engineering, Computer Science, or related field (PhD preferred).
  • 20+ years (Lead) or 8+ years (IC) in CPU performance analysis, workload characterization, or microarchitecture engineering.
  • Deep, hands-on experience with instruction tracing using binary instrumentation and emulation frameworks (DynamoRIO, QEMU) to generate traces for consumption by CPU performance simulators.
  • Deep, hands-on experience with workload characterization using PMU-based tools (Linux perf, VTune) — top-down microarchitecture analysis (TMA), hardware counter collection, and bottleneck quantification.
  • Strong understanding of the CPU pipeline: front-end (fetch, decode, branch prediction), out-of-order execution, and memory hierarchy (caches, TLBs, prefetchers).
  • Solid understanding of the Linux software stack — OS internals, kernel paths, scheduling, libraries and frameworks — and its interaction with CPU microarchitecture.
  • Strong scripting skills in Python and Perl; ability to navigate, instrument, and modify large codebases (compilers, simulators, OS).
  • Proven ability to construct microbenchmarks to isolate, reproduce, and root-cause microarchitectural performance bottlenecks.
  • Hands-on experience with industry standard benchmarks such as SPEC CPU, GeekBench, Speedometer, Cinebench, SPEC JBB, and DCperf.
  • Experience in RISC-V or ARM64 or x86 ISA; ability to read and analyze compiler-generated assembly.

REQUIREMENTS — nice to HAVE

  • Experience integrating workload traces into cycle-accurate CPU performance simulators.
  • Familiarity with compiler optimization techniques (vectorization, inlining, loop transformations) and their microarchitectural impact.
  • Experience developing data visualization or analysis tooling to communicate performance insights to cross-functional teams.
  • Prior principal-level role at a CPU or SoC design organization.


Appy Now

other positions

Thermal Engineer

More Info More Info 

Signal Integrity and Power Integrity Engineer

More Info More Info 

PCB Design Layout Engineer

More Info More Info 

Mechanical Stress Analysis Engineer

More Info More Info 

IC Package Engineer

More Info More Info 

CPU Physical Design (Lead & IC Engineers)

More Info More Info 

CPU Software Validation Engineering Lead

More Info More Info 

CPU Telemetry & Observability Development Lead

More Info More Info 

CPU Firmware Development Lead

More Info More Info 

CPU Operating System Development Lead

More Info More Info