\chapter{功能验证与性能评测}\label{sec:eval}

本文设计了一种可用于防御 Spectre 攻击的微体系结构。本章对这种微体系结构进行功能验证，说明它满足威胁模型所要求的安全性质，并且对这种设计进行性能评测。

\section{评测环境}

本文对 Gem5 模拟器修改，对以下 5 种配置进行评测：Baseline, Fence,
Fence+DIFT, IS, IS+DIFT. Baseline 表示受 Spectre 攻击影响的处理器模型，
Fence 和 IS 分别表示对所有推测式执行中的装载指令，推迟执行和用
InvisiSpec 的方案执行，Fence+DIFT 和 IS+DIFT 则是使用动态信息流追踪识别
可能泄露秘密数据的装载指令，只对这些装载使用相应的安全的执行方案。

表\ref{tab:Gem5_conf}中列出了所有处理器配置的基本配置。

\begin{table}
\centering
\caption{模拟的处理器的基本配置}
\label{tab:Gem5_conf}
\begin{tabular}{|c|c|}
\hline 
参数 & 配置 \tabularnewline
\hline 
\hline 
指令系统 & x86\_64\tabularnewline
\hline 
\multirow{9}{*}{功能单元} & IntAlu\tabularnewline
\cline{2-2} 
 & IntMult + IntAlu\tabularnewline
\cline{2-2} 
 & FloatAdd + FloatCmp + FloatCvt\tabularnewline
\cline{2-2} 
 & FloatMult + FloatMultAcc + FloatMisc + FloatDiv + FloatSqrt\tabularnewline
\cline{2-2} 
 & MemRead + FloatMemRead\tabularnewline
\cline{2-2} 
 & SIMD\tabularnewline
\cline{2-2} 
 & MemWrite + FloatMemWrite\tabularnewline
\cline{2-2} 
 & MemRead + MemWrite + FloatMemRead + FloatMemWrite\tabularnewline
\cline{2-2} 
 & IprAccess\tabularnewline
\hline
提交宽度 & 8\tabularnewline
\hline 
重排序缓冲区长度 & 192\tabularnewline
\hline 
装载队列长度 & 32\tabularnewline
\hline 
存储队列长度 & 32\tabularnewline
\hline 
发射队列长度 & 64\tabularnewline
\hline 
转移预测 & TournamentBP, 4096 BTB, 16 RAS\tabularnewline
\hline 
L1 I-Cache & 32KB, 4路组相联, 1周期延迟\tabularnewline
\hline 
L1 D-Cache & 64KB, 8路组相联, 1周期延迟\tabularnewline
\hline 
L2 Cache & 2MB, 16路组相联, 8周期延迟\tabularnewline
\hline 
\end{tabular}
\end{table}

\section{评测指标}

本文评测每种微体系结构设计的安全性和性能。安全性表现为是否受 Spectre
攻击的影响，本文构造测试程序测试微体系结构的安全性。性能指标使用每个微
体系结构设计运行基准程序，和 Baseline 的运行时间的比值，作为相对运行时间。
性能开销为相对 Baseline 增长的运行时间。平均性能取各个基准程序相对运行时间的几何平均数。

\section{功能验证}

本文构造一个测试程序对每种配置的处理器进行安全性的测试，用于验证实现的
方案可以防御 Spectre 攻击。

由于 Gem5 的 Ruby 存储模型不支持 clflush 指令，因此测试程序使
用 Evict+Reload 的方式进行攻击。由于配置的系统末级缓存为 2MB，可以对一
个2MB 的存储区域进行访问，以清除缓存内原有内容，具体代码可
见附件\ref{lst:poc_for_gem5}。

在此验证程序中，攻击者要通过受害者执行的函数 victim 的推测式执行泄
露 victim 在正常执行中无法访问的 X 的值。攻击者先训练 victim 的分支预测
器，清除缓存中原有的内容，之后我们设置 X 为 123，然后攻击者再让 victim
执行访问 X，最后扫描 array2 检查其中是否有元素在缓存中命中。

在 Baseline 配置中，执行上述程序，可以看到 \verb|array2[123 * 64]| 在
缓存中命中，在 array2 的其他位置缓存缺失，从而攻击者可以通过 Spectre
攻击得到 X 的值 123. 而在其他配置中，array2 的所有位置都发生缓存缺失，
从而攻击者无法得出 X 的值，说明这些配置都能防御 Spectre 攻击。

\section{SPEC CPU2006 的性能评测与分析}

本文对 21 个 SPEC CPU2006 基准测试进行评测，基准测试的数据集使用 ref
集。所有的基准程序均用 GCC 8.3.0 编译，编译优化选项为 -O2，并且和
Glibc 2.24 静态链接。

由于 Gem5 模拟器运行 SPEC CPU2006 所需时间过长，因此评测时选取部分指令，
方法是先用 Gem5 的 AtomicSimpleCPU 运行 10000000000 条指令进行程序的预
热，再用待评测的处理器配置运行 1000000000 条指令，得出评测结果。

%% 表\ref{tab:spec2006}中列出每个 SPEC CPU2006 在 Baseline 模式下运行的指
%% 令数和操作数，在其他处理器配置下，实际运行的指令数可能有增加或减少，相
%% 应的操作数也有增加或减少。

%% \begin{table}
%% \begin{tabular}{|c|c|c|c|}
%% \hline 
%% 基准测试程序 & 类型 & 指令数 & 操作数\tabularnewline
%% \hline 
%% \hline 
%% 401.bzip2 & int & 1000000015 & 1607899217\tabularnewline
%% \hline 
%% 429.mcf & int & 1000000012 & 1308544223\tabularnewline
%% \hline 
%% 445.gobmk & int & 1000000014 & 1932674840\tabularnewline
%% \hline 
%% 456.hmmer & int & 1000000013 & 1980694497\tabularnewline
%% \hline 
%% 458.sjeng & int & 1000000010 & 1827828221\tabularnewline
%% \hline 
%% 462.libquantum & int & 1000000011 & 1714852019\tabularnewline
%% \hline 
%% 464.h264ref & int & 1000000017 & 1545541668\tabularnewline
%% \hline 
%% 471.omnetpp & int & 1000000011 & 1971636254\tabularnewline
%% \hline 
%% 473.astar & int & 1000000010 & 1702905457\tabularnewline
%% \hline 
%% 410.bwaves & fp & 1000000011 & 1839010239\tabularnewline
%% \hline 
%% 433.milc & fp & 1000000018 & 1285963875\tabularnewline
%% \hline 
%% 434.zeusmp & fp & 1000000014 & 1524235186\tabularnewline
%% \hline 
%% 435.gromacs & fp & 1000000013 & 1630938771\tabularnewline
%% \hline 
%% 436.cactusADM & fp & 1000000014 & 1320986203\tabularnewline
%% \hline 
%% 437.leslie3d & fp & 1000000010 & 1415843900\tabularnewline
%% \hline 
%% 444.namd & fp & 1000000009 & 1333316425\tabularnewline
%% \hline 
%% 450.soplex & fp & 1000000013 & 1664925057\tabularnewline
%% \hline 
%% 454.calculix & fp & 1000000010 & 1691328153\tabularnewline
%% \hline 
%% 459.GemsFDTD & fp & 1000000014 & 1320385828\tabularnewline
%% \hline 
%% 470.lbm & fp & 1000000011 & 1191402714\tabularnewline
%% \hline 
%% 482.sphinx3 & fp & 1000000016 & 1618883137\tabularnewline
%% \hline 
%% \end{tabular}
%% \caption{SPEC CPU2006 各基准程序的指令数和操作数}
%% \label{tab:spec2006}
%% \end{table}

图 \ref{fig:is_spec06_result} 是每种配置的处理器运行 SPEC CPU2006 相对于
Baseline 的运行时间。其中 cactusADM 和 lbm 由于每种配置的相对运行时间
都接近 1,故未列入图中。

\begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{result.eps}
  \caption{每种配置的处理器运行 SPEC CPU2006 的相对运行时间}
  \label{fig:is_spec06_result}
\end{figure}

从评测结果可以看出，在使用 DIFT 识别可能泄露数据的装载指令后，推迟这
些指令的执行，有 15\% 的性能开销，平均性能开销比 InvisiSpec 小。在此基
础上，用 InvisiSpec 的方案执行这些装载指令，可以进一步将性能开销减少
至 8.5\%. 其中在 IS+DIFT 方案中，有 12 个基准程序的性能开销在 3\% 以下，
6 个在 10\% 至 20\%. 性能开销最大的是 omnetpp, 所有安全的方案都会造成
94\% 以上的性能开销。

\begin{figure}[htbp]
  \centering
  \includegraphics[width=\textwidth]{specload_ratio.eps}
  \caption{SPEC CPU2006 中 SpecLoad 在所有操作中的比例}
  \label{fig:specload}
\end{figure}

%% \begin{table}[htbp]
%% \begin{tabular}{|c|c|c|c|}
%% \hline
%% 基准测试  &  IS  &  IS+DIFT  &  SpecLoad 减少量\tabularnewline
%% \hline
%% \hline
%% astar & 34.80\% & 19.79\% & 43.13\%\tabularnewline
%% \hline
%% bwaves & 10.75\% & 0.23\% & 97.89\%\tabularnewline
%% \hline
%% bzip2 & 12.12\% & 5.49\% & 54.74\%\tabularnewline
%% \hline
%% cactusADM & 0.51\% & 0.13\% & 74.51\%\tabularnewline
%% \hline
%% calculix & 9.89\% & 3.06\% & 69.04\%\tabularnewline
%% \hline
%% GemsFDTD & 6.29\% & 0.13\% & 97.91\%\tabularnewline
%% \hline
%% gobmk & 12.61\% & 2.60\% & 79.38\%\tabularnewline
%% \hline
%% gromacs & 0.76\% & 0.06\% & 92.74\%\tabularnewline
%% \hline
%% h264ref & 5.43\% & 0.94\% & 82.68\%\tabularnewline
%% \hline
%% hmmer & 9.52\% & 4.82\% & 49.35\%\tabularnewline
%% \hline
%% lbm & 1.39\% & 0.00\% & 100.00\%\tabularnewline
%% \hline
%% leslie3d & 5.98\% & 0.11\% & 98.23\%\tabularnewline
%% \hline
%% libquantum & 1.99\% & 0.00\% & 99.99\%\tabularnewline
%% \hline
%% mcf & 10.56\% & 2.11\% & 79.99\%\tabularnewline
%% \hline
%% milc & 3.22\% & 0.10\% & 97.04\%\tabularnewline
%% \hline
%% namd & 2.67\% & 0.84\% & 68.73\%\tabularnewline
%% \hline
%% omnetpp & 5.94\% & 1.94\% & 67.37\%\tabularnewline
%% \hline
%% sjeng & 13.19\% & 2.72\% & 79.35\%\tabularnewline
%% \hline
%% soplex & 9.68\% & 4.63\% & 52.22\%\tabularnewline
%% \hline
%% sphinx3 & 6.63\% & 1.41\% & 78.68\%\tabularnewline
%% \hline
%% zeusmp & 6.70\% & 0.03\% & 99.58\%\tabularnewline
%% \hline
%% \end{tabular}
%% \centering
%% \caption{SPEC CPU2006 中 SpecLoad 在所有操作中的比例}
%% \label{tab:specload}
%% \end{table}
%% 
为了观察使用基于信息流追踪的检测机制的效果，可以统计 IS 和 IS+DIFT 方
案中 SpecLoad 请求的数量。图 \ref{fig:specload} 列出两种方案中，SpecLoad
请求的数量和程序的总操作数（Gem5 模拟后得出的 sim\_ops 结果）的比例。

可以看出，几乎所有的基准程序中，DIFT 可以过滤一半以上被认为不安全的装
载操作，其中 bwaves, GemsFDTD, libquantum, milc 中，推测式执行中的装载
指令只有 3\% 以下需要用安全的方式执行，因此使用 Fence+DIFT 即可以获得
与 Baseline 几乎同等的性能。

在 IS+DIFT 性能开销最大的 omnetpp 中，SpecLoad 的数量只减少了 67\%，而
性能开销接近 20\% 的 calculix 和 astar，SpecLoad 减少的数量也在 70\%
以下。而在 calculix 和 astar 中，对不安全的装载指令，用 InvisiSpec 策
略执行，比阻止这些指令的执行，有明显的性能提升。

%\Todo: 评测结果的分析
\section{小结}

本章先介绍了评测环境，列出了各个处理器配置的基本参数。其后给出了评测指
标，包括安全性的测试和性能测试的指标。最后给出了安全测试的方法和结果，
以及用 SPEC CPU2006 进行性能评测得出的性能结果与分析。功能验证表明，本
文使用的基于 DIFT 的检测方法和 InvisiSpec 装载指令执行策略，都能阻止
Spectre 攻击泄露内存中的数据。通过将这两种方法结合，防御 Spectre 攻击
的性能开销为 8.5\%.