2. why not use a TLB or bigger size? currently the TLB has 1<<8 entries. the
TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i
measured this using Intel PIN. so even the miss rate is low (say 3%) the
overall time spent in the cpu_x86_handle_mmu_fault is still signifcant.
I'd be interested to experiment with different TLB sizes, to see what effect
that has on performance. But I suspect that lack of TLB contexts mean that we
wind up flushing the TLB more often than real hardware does, and therefore a
larger TLB merely takes longer to flush.