You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

exercises.tex 11KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
  1. \documentclass [10pt]{article}
  2. \usepackage{latexsym}
  3. \usepackage{amssymb}
  4. \usepackage{epsfig}
  5. \usepackage{fullpage}
  6. \usepackage{enumerate}
  7. \usepackage{xspace}
  8. \usepackage{todonotes}
  9. \usepackage{listings}
  10. \usepackage{url}
  11. \usepackage[ruled,linesnumbered]{algorithm2e} % Enables the writing of pseudo code.
  12. \usepackage{float}% http://ctan.org/pkg/float
  13. \newcommand{\true}{true}
  14. \newcommand{\false}{false}
  15. \pagestyle{plain}
  16. \bibliographystyle{plain}
  17. \title{192.127 Seminar in Software Engineering (Smart Contracts) \\
  18. SWC-124: Write to Arbitrary Storage Location}
  19. \author{Exercises}
  20. \date{WT 2023/24}
  21. \author{\textbf{Ivanov, Ivaylo (11777707) \& Millauer, Peter (01350868)}}
  22. \newtheorem{theorem}{Theorem}
  23. \newtheorem{lemma}[theorem]{Lemma}
  24. \newtheorem{corollary}[theorem]{Corollary}
  25. \newtheorem{proposition}[theorem]{Proposition}
  26. \newtheorem{conjecture}[theorem]{Conjecture}
  27. \newtheorem{definition}[theorem]{Definition}
  28. \newtheorem{example}[theorem]{Example}
  29. \newtheorem{remark}[theorem]{Remark}
  30. \newtheorem{exercise}[theorem]{Exercise}
  31. \renewcommand{\labelenumi}{(\alph{enumi})}
  32. \usepackage{xcolor}
  33. \definecolor{codegreen}{rgb}{0,0.6,0}
  34. \definecolor{codegray}{rgb}{0.5,0.5,0.5}
  35. \definecolor{codepurple}{rgb}{0.58,0,0.82}
  36. \definecolor{backcolour}{rgb}{0.95,0.95,0.92}
  37. \lstdefinestyle{mystyle}{
  38. backgroundcolor=\color{backcolour},
  39. commentstyle=\color{codegreen},
  40. keywordstyle=\color{magenta},
  41. numberstyle=\tiny\color{codegray},
  42. stringstyle=\color{codepurple},
  43. basicstyle=\ttfamily\footnotesize,
  44. breakatwhitespace=false,
  45. breaklines=true,
  46. captionpos=b,
  47. keepspaces=true,
  48. numbers=left,
  49. numbersep=5pt,
  50. showspaces=false,
  51. showstringspaces=false,
  52. showtabs=false,
  53. tabsize=2
  54. }
  55. \begin{document}
  56. \maketitle
  57. \section{Weakness and consequences}
  58. \subsection{Solidity storage layout}
  59. Any contract's storage is a continuous 256-bit address space consisting of 32-bit values. In order to implement dynamically sized data structures like maps and arrays, Solidity distributes their entries in a pseudo-random location. Due to the vast 256-bit range of addresses collisions are statistically extremely improbable and of little practical relevance in safely implemented contracts.
  60. \medspace
  61. In the case of a dynamic array at variable slot $p$, data is written to continuous locations starting at $keccak(p)$. The array itself contains the length information as an $uint256$ value. Even enormous arrays are unlikely to produce collisions due to the vast address space, although an improperly managed array may store data to an unbounded user-controlled offset, thereby allowing arbitrary overwriting of data.
  62. \medspace
  63. For maps stored in variable slot $p$ the data for index $k$ can be found at $keccak(k . p)$ where $.$ is the concatenation operator. This is a statistically safe approach, as the chance of intentionally finding a value for $keccak(k . p)$ s.t. for a known stored variable $x$, $keccak(k . p) == storage\_address(x)$ is about one in $2^{256}$ and $keccak$ is believed to be a cryptographically secure hash function.
  64. \subsection{The Weakness}
  65. Any unchecked array write is potentially dangerous, as the storage-location of all variables is publicly known and an unconstrained array index can be reverse engineered to target them. This can be achieved by using the known array storage location $p$, target-variable $x$, and computing the offset-value $o$ such that $keccac(p) + o == storage\_address(x)$.
  66. \medspace
  67. A trivial example of such a vulnerable write operation is shown in Algorithm~\ref{alg:vuln-write}.
  68. \lstset{style=mystyle}
  69. \begin{algorithm}[H]
  70. \begin{lstlisting}[language=Octave]
  71. pragma solidity 0.4.25;
  72. contract MyContract {
  73. address private owner;
  74. uint[] private arr;
  75. constructor() public {
  76. arr = new uint[](0);
  77. owner = msg.sender;
  78. }
  79. function write(unit index, uint value) {
  80. arr[index] = value;
  81. }
  82. }
  83. \end{lstlisting}
  84. \caption{A completely unchecked array write}
  85. \label{alg:vuln-write}
  86. \end{algorithm}
  87. \medspace
  88. In the following example (Algorithm~\ref{alg:pop-incorrect}) the $pop$ function incorrectly checks for an array $length >= 0$, thereby allowing the $length$ value to underflow when called with an empty array. Once this weakness is triggered, $update$ in Algorithm~\ref{alg:pop-incorrect} behaves just like $write$ did in Algorithm~\ref{alg:pop-incorrect}.
  89. \medspace
  90. \lstset{style=mystyle}
  91. \begin{algorithm}[H]
  92. \begin{lstlisting}[language=Octave]
  93. pragma solidity 0.4.25;
  94. contract MyContract {
  95. address private owner;
  96. uint[] private arr;
  97. constructor() public {
  98. arr = new uint[](0);
  99. owner = msg.sender;
  100. }
  101. function push(value) {
  102. arr[arr.length] = value;
  103. arr.length++;
  104. }
  105. function pop() {
  106. require(arr.length >= 0);
  107. arr.length--;
  108. }
  109. function update(unit index, uint value) {
  110. require(index < arr.length);
  111. arr[index] = value;
  112. }
  113. }
  114. \end{lstlisting}
  115. \caption{An incorrectly managed array length}
  116. \label{alg:pop-incorrect}
  117. \end{algorithm}
  118. Another weakness that allows arbitrary storage access is unchecked assembly code. Assembly is a powerful tool that allows the developers to get as close to the EVM as they can,
  119. but it may also be very dangerous when not tested correctly. As per the documentation\footnote{\url{https://docs.soliditylang.org/en/latest/assembly.html}}: \textit{"this [inline assembly]
  120. bypasses important safety features and checks of Solidity. You should only use it for tasks that need it, and only if you are confident with using it."}
  121. When given access to such lowlevel structures, a programmer can built-in not only weaknesses similar to the ones described previously, but also others, such as overwriting map locations,
  122. contract variables etc.
  123. An example for such a weakness is given in Algorithm~\ref{alg:unchecked-assembly}.
  124. \medspace
  125. \lstset{style=mystyle}
  126. \begin{algorithm}[H]
  127. \begin{lstlisting}[language=Octave]
  128. pragma solidity 0.4.25;
  129. contract MyContract {
  130. address private owner;
  131. mapping(address => bool) public managers;
  132. constructor() public {
  133. owner = msg.sender;
  134. setNextUserRole(msg.sender);
  135. }
  136. function setNextManager(address next) internal {
  137. uint256 slot;
  138. assembly {
  139. slot := managers.slot
  140. sstore(slot, next)
  141. }
  142. bytes32 location = keccak256(abi.encode(160, uint256(slot)));
  143. assembly {
  144. sstore(location, true)
  145. }
  146. }
  147. function registerUser(address user) {
  148. require(msg.sender == owner);
  149. setNextManager(user);
  150. }
  151. function cashout() {
  152. require(managers[msg.sender]);
  153. address payable manager = msg.sender;
  154. manager.transfer(address(this).balance);
  155. }
  156. }
  157. \end{lstlisting}
  158. \caption{An unchecked assembly write to mapping}
  159. \label{alg:unchecked-assembly}
  160. \end{algorithm}
  161. The contract has a manager mapping, which should be used as a stack.
  162. The developer has added the \texttt{setNextManager} function, which should set the top of the stack to the latest user as a manager.
  163. The issue is that the function is implemented in such a way, that the stack would not grow, but the first element would always be overwritten - this arises from the fact that the memory slot
  164. of the managers mapping does not point to the memory address on the top of the stack, but instead to the base of it.
  165. The function is then using this slot address directly, without calculating any offset, overwriting the base of the stack. If social engineeering is applied, an attacker can persuade the
  166. owner to set them as a manager, which would result in the weakness being exploited directly and the owner giving up their own management rights.
  167. \subsection{Consequences}
  168. The consequences of exploiting an arbitrary storage access weakness can be of different types and severity.
  169. An attacker may gain read-write access to private contract data, which should only be accessible to owners, maintainers etc.
  170. They may also exploit the contract to circumvent authorization checks and drain the contract funds.
  171. %TODO: can we expand this?
  172. \section{Vulnerable contracts in literature}
  173. collect vulnerable contracts used by different papers to motivate/illustrate the weakness
  174. \section{Code properties and automatic detection}
  175. Automatic detection tools can be broadly categorized into ones employing static analysis and those who use fuzzing, i.e. application of semi-random inputs. Notable static analysis tools include Securify \cite{securify} and teEther \cite{teether} which both function in a similar manner:
  176. \medspace
  177. Initially, the given EVM byte-code is disassembled into a control-flow-graph (CFG). In the second step, the tools identify potentially risky instructions. In the case of arbitrary writes, the instruction of note is $sstore(k,v)$ where both $k$ and $v$ are input-controlled. The tools differ in the way they identify whether or not the values are input-controlled.
  178. \medspace
  179. In the case of Securify \cite{securify}, the CFG is translated into what the authors call "semantic facts" to which an elaborate set of so-called security patterns is applied. These patterns consist of building blocks in the form of predicates, which allows the tool to simply generate output based on the (transitively) matched patterns.
  180. \medspace
  181. teEther \cite{teether} employs a similar approach, but instead the authors opt to build a graph of dependent variables. If the graph arrives at a $sstore(k,v)$ instruction and a path can be found leading to user-controlled inputs, the tool infers a set of constraints which are then used to automatically generate an exploit.
  182. \medspace
  183. The fuzz-driven approach to vulnerability detection is more abstract, as general-purpose fuzzing tools generally don't have knowledge of the analysed program. For the tool SmartFuzzDriverGenerator \cite{fuzzdrivegen}, a multitude of these fuzzing libraries can be used. The problem at hand is, however, that the technique cannot interface with a smart contract out of the box. The "glue" between fuzzer and program is called a driver, hence the name of "driver-generator".
  184. \medspace
  185. SmartFuzzDriverGenerator aims to automatically generate such a driver by %TODO: I have no idea how it does this actually%
  186. \medspace
  187. The Smartian tool \cite{smartian} attempts to find a middle-ground between static and dynamic analysis by first transforming the EVM bytecode into control-flow facts. Based on this information, a set of seed-inputs is generated that are expected to have a high probability of yielding useable results. Should no exploit be found, the seed-inputs are then mutated in order to yield a higher code coverage. %TODO: This is probably extemely inprecise and should be re-written%
  188. \section{Exploit sketch}
  189. \cite{doughoyte}
  190. %TODO: just explain what this guy does: https://github.com/Arachnid/uscc/tree/master/submissions-2017/doughoyte%
  191. \bibliography{exercise.bib}
  192. \end{document}