Browse Source

improve paper formatting, code coloring, add abstract+conclusion, add safe code example, fix typos

master
nitowa 6 months ago
parent
commit
6a660d3edd
2 changed files with 144 additions and 35 deletions
  1. BIN
      SemSEpaper/exercises.pdf
  2. 144
    35
      SemSEpaper/exercises.tex

BIN
SemSEpaper/exercises.pdf View File


+ 144
- 35
SemSEpaper/exercises.tex View File

@@ -46,6 +46,40 @@
46 46
 \definecolor{codegray}{rgb}{0.5,0.5,0.5}
47 47
 \definecolor{codepurple}{rgb}{0.58,0,0.82}
48 48
 \definecolor{backcolour}{rgb}{0.95,0.95,0.92}
49
+\definecolor{verylightgray}{rgb}{.97,.97,.97}
50
+
51
+\lstdefinelanguage{Solidity}{
52
+	keywords=[1]{anonymous, assembly, assert, balance, break, call, callcode, case, catch, class, constant, continue, constructor, contract, debugger, default, delegatecall, delete, do, else, emit, event, experimental, export, external, false, finally, for, function, gas, if, implements, import, in, indexed, instanceof, interface, internal, is, length, library, log0, log1, log2, log3, log4, memory, modifier, new, payable, pragma, private, protected, public, pure, push, require, return, returns, revert, selfdestruct, send, solidity, storage, struct, suicide, super, switch, then, this, throw, transfer, true, try, typeof, using, value, view, while, with, addmod, ecrecover, keccak256, mulmod, ripemd160, sha256, sha3}, % generic keywords including crypto operations
53
+	keywordstyle=[1]\color{blue}\bfseries,
54
+	keywords=[2]{address, bool, byte, bytes, bytes1, bytes2, bytes3, bytes4, bytes5, bytes6, bytes7, bytes8, bytes9, bytes10, bytes11, bytes12, bytes13, bytes14, bytes15, bytes16, bytes17, bytes18, bytes19, bytes20, bytes21, bytes22, bytes23, bytes24, bytes25, bytes26, bytes27, bytes28, bytes29, bytes30, bytes31, bytes32, enum, int, int8, int16, int24, int32, int40, int48, int56, int64, int72, int80, int88, int96, int104, int112, int120, int128, int136, int144, int152, int160, int168, int176, int184, int192, int200, int208, int216, int224, int232, int240, int248, int256, mapping, string, uint, uint8, uint16, uint24, uint32, uint40, uint48, uint56, uint64, uint72, uint80, uint88, uint96, uint104, uint112, uint120, uint128, uint136, uint144, uint152, uint160, uint168, uint176, uint184, uint192, uint200, uint208, uint216, uint224, uint232, uint240, uint248, uint256, var, void, ether, finney, szabo, wei, days, hours, minutes, seconds, weeks, years},	% types; money and time units
55
+	keywordstyle=[2]\color{teal}\bfseries,
56
+	keywords=[3]{block, blockhash, coinbase, difficulty, gaslimit, number, timestamp, msg, data, gas, sender, sig, value, now, tx, gasprice, origin},	% environment variables
57
+	keywordstyle=[3]\color{violet}\bfseries,
58
+	identifierstyle=\color{black},
59
+	sensitive=true,
60
+	comment=[l]{//},
61
+	morecomment=[s]{/*}{*/},
62
+	commentstyle=\color{gray}\ttfamily,
63
+	stringstyle=\color{red}\ttfamily,
64
+	morestring=[b]',
65
+	morestring=[b]"
66
+}
67
+
68
+\lstset{
69
+	language=Solidity,
70
+	backgroundcolor=\color{verylightgray},
71
+	extendedchars=true,
72
+	basicstyle=\footnotesize\ttfamily,
73
+	showstringspaces=false,
74
+	showspaces=false,
75
+	numbers=left,
76
+	numberstyle=\footnotesize,
77
+	numbersep=9pt,
78
+	tabsize=2,
79
+	breaklines=true,
80
+	showtabs=false,
81
+	captionpos=b
82
+}
49 83
 
50 84
 \lstdefinestyle{mystyle}{
51 85
 	backgroundcolor=\color{backcolour},
@@ -73,6 +107,10 @@
73 107
 
74 108
 \maketitle
75 109
 
110
+\begin{abstract}
111
+	This paper outlines different forms of the common smart contract weakness with the SWC number 124, commonly referred to as "Write to Arbitrary Storage Location". While this paper focuses on applications within the context of Ethereum's EVM and higher-level language Solidity, we will also briefly touch on other research that deals with the Hyperledger Fabric environment. We will begin with a gentle introduction to the Solidity storage layout design that allows this weakness to occur, followed by common forms of exploit, alongside their associated consequences. Finally, we will outline the code characteristics that are detectable by automated tools as well as an exploit sketch.
112
+\end{abstract}
113
+
76 114
 \section{Weakness and consequences}
77 115
 
78 116
 \subsection{Solidity storage layout}
@@ -95,9 +133,9 @@ Any unchecked array write is potentially dangerous, as the storage-location of a
95 133
 
96 134
 A trivial example of such a vulnerable write operation is shown in Algorithm~\ref{alg:vuln-write}.
97 135
 
98
-\lstset{style=mystyle}
136
+
99 137
 \begin{algorithm}[H]
100
-	\begin{lstlisting}[language=Octave]
138
+	\begin{lstlisting}[language=Solidity]
101 139
 	pragma solidity 0.4.25;
102 140
 
103 141
 	contract MyContract {
@@ -124,9 +162,8 @@ In the following example (Algorithm~\ref{alg:pop-incorrect}) the $pop$ function
124 162
 
125 163
 \medspace
126 164
 
127
-\lstset{style=mystyle}
128 165
 \begin{algorithm}[H]
129
-	\begin{lstlisting}[language=Octave]
166
+	\begin{lstlisting}[language=Solidity]
130 167
 	pragma solidity 0.4.25;
131 168
 
132 169
 	contract MyContract {
@@ -158,19 +195,18 @@ In the following example (Algorithm~\ref{alg:pop-incorrect}) the $pop$ function
158 195
   \label{alg:pop-incorrect}
159 196
 \end{algorithm}
160 197
 
161
-Another weakness that allows arbitrary storage access is unchecked assembly code. Assembly is a powerful tool that allows the developers to get as close to the EVM as they can,
162
-but it may also be very dangerous when not tested correctly. As per the documentation\footnote{\url{https://docs.soliditylang.org/en/latest/assembly.html}, accessed: Oct. 30th 2023}: \textit{"this [inline assembly]
198
+\medspace
199
+
200
+Another weakness that allows arbitrary storage access is unchecked assembly code. Assembly is a powerful tool that allows the developers to get as close to the EVM as they can, but it may also be very dangerous when not used correctly. As per the documentation\footnote{\url{https://docs.soliditylang.org/en/latest/assembly.html}, accessed: Oct. 30th 2023}: \textit{"this [inline assembly]
163 201
 bypasses important safety features and checks of Solidity. You should only use it for tasks that need it, and only if you are confident with using it."}
164
-When given access to such lowlevel structures, a programmer can built-in not only weaknesses similar to the ones described previously, but also others, such as overwriting map locations,
165
-contract variables etc.
202
+When given access to such low-level instructions, a programmer can construct not only weaknesses similar to the ones described previously, but also others, such as overwriting map locations, contract variables etc.
166 203
 
167 204
 An example for such a weakness is given in Algorithm~\ref{alg:unchecked-assembly}.
168 205
 
169 206
 \medspace
170 207
 
171
-\lstset{style=mystyle}
172 208
 \begin{algorithm}[H]
173
-	\begin{lstlisting}[language=Octave]
209
+	\begin{lstlisting}[language=Solidity]
174 210
 	pragma solidity 0.4.25;
175 211
 
176 212
 	contract MyContract {
@@ -211,12 +247,15 @@ An example for such a weakness is given in Algorithm~\ref{alg:unchecked-assembly
211 247
   \label{alg:unchecked-assembly}
212 248
 \end{algorithm}
213 249
 
214
-The contract has a manager mapping, which should be used as a stack.
250
+\medspace
251
+
252
+The contract has a manager mapping, which is intended to be used as a stack.
215 253
 The developer has added the \texttt{setNextManager} function, which should set the top of the stack to the latest user as a manager.
216
-The issue is that the function is implemented in such a way, that the stack would not grow, but the first element would always be overwritten - this arises from the fact that the memory slot
254
+The issue is that the function is implemented in such a way, that the stack does not grow, but the first element is always overwritten - this arises from the fact that the memory slot
217 255
 of the managers mapping does not point to the memory address on the top of the stack, but instead to the base of it.
218 256
 The function is then using this slot address directly, without calculating any offset, overwriting the base of the stack. If social engineeering is applied, an attacker can persuade the
219 257
 owner to set them as a manager, which would result in the weakness being exploited directly and the owner giving up their own management rights.
258
+
220 259
 \subsection{Consequences}
221 260
 
222 261
 The consequences of exploiting an arbitrary storage access weakness can be of different types and severity.
@@ -225,15 +264,53 @@ They may also exploit the contract to circumvent authorization checks and drain
225 264
 According to Li Duan et al.~\cite{multilayer}, an attacker may also be able to destroy the contract storage structure and thus cause
226 265
 unexpected program flow, abnormal function execution or contract freeze.
227 266
 
267
+\subsection{Similar yet safe code example}
268
+
269
+Using dynamic arrays is naturally not inherently dangerous, as long as they're used properly. The following version of Algorithm~\ref{alg:pop-incorrect} correctly checks for array length, and thereby prevents the integer underflow of the length value. This code example is not vulnerable to the techniques shown in this paper.
270
+
271
+\medspace
272
+
273
+\begin{algorithm}[H]
274
+	\begin{lstlisting}[language=Solidity]
275
+		pragma solidity 0.4.25;
276
+		
277
+		contract MyContract {
278
+			address private owner;
279
+			uint[] private arr;
280
+			
281
+			constructor() public {
282
+				arr = new uint[](0);
283
+				owner = msg.sender;
284
+			}
285
+			
286
+			function push(value) {
287
+				arr[arr.length] = value;
288
+				arr.length++;
289
+			}
290
+			
291
+			function pop() {
292
+				require(arr.length > 0);
293
+				arr.length--;
294
+			}
295
+			
296
+			function update(unit index, uint value) {
297
+				require(index < arr.length);
298
+				arr[index] = value;
299
+			}
300
+		}
301
+	\end{lstlisting}
302
+	\caption{Correctly managed array length}
303
+	\label{alg:pop-correct}
304
+\end{algorithm}
305
+
228 306
 \section{Vulnerable contracts in literature}
229 307
 
230 308
 One example for vulnerable contracts, which is similar to Algorithm~\ref{alg:pop-incorrect}, is mentioned in the paper by Li Duan et al.~\cite{multilayer}:
231 309
 
232 310
 \medspace
233 311
 
234
-\lstset{style=mystyle}
235 312
 \begin{algorithm}[H]
236
-	\begin{lstlisting}[language=Octave]
313
+	\begin{lstlisting}[language=Solidity]
237 314
     function PopBonusCode() public {
238 315
       require(0 <= bonusCodes.length);
239 316
       bonusCodes.length--;
@@ -248,16 +325,14 @@ One example for vulnerable contracts, which is similar to Algorithm~\ref{alg:pop
248 325
   \label{alg:multilayer-example}
249 326
 \end{algorithm}
250 327
 
251
-We will not go into a detailed explanation, as we already did this in the previous section.
252
-
328
+\medspace
253 329
 
254
-A more sophisticated example is presented in the paper by Sukrit Kalra et al.~\cite{Kalra2018ZEUSAS}:
330
+We will not go into a detailed explanation, as we already did this in the previous section. A more sophisticated example is presented in the paper by Sukrit Kalra et al.~\cite{Kalra2018ZEUSAS}:
255 331
 
256 332
 \medspace
257 333
 
258
-\lstset{style=mystyle}
259 334
 \begin{algorithm}[H]
260
-	\begin{lstlisting}[language=Octave]
335
+	\begin{lstlisting}[language=Solidity]
261 336
     uint payout = balance/participants.length;
262 337
     for (var i = 0; i < participants.length; i++)
263 338
       participants[i].send(payout);
@@ -266,9 +341,13 @@ A more sophisticated example is presented in the paper by Sukrit Kalra et al.~\c
266 341
   \label{alg:zeus-example}
267 342
 \end{algorithm}
268 343
 
269
-The vulnerability here is an integer overflow - as the variable \texttt{i} is dinamically typed, it will get the smallest possible type that will be able to hold the value 0 - that being \texttt{uint8}, which is able to hold positive integers up to 255.
344
+\medspace
345
+
346
+The vulnerability here is an integer overflow - as the variable \texttt{i} is dynamically typed, it will get the smallest possible type that will be able to hold the value 0 - that being \texttt{uint8}, which is able to hold positive integers up to 255.
270 347
 
271
-Because of this, if the length of the \texttt{participants} arrays is greater than 255, the integer overflows on the 256th iteration and instead of moving on to \texttt{participants[255]}, it reverts back to the first element in the array. As a result, the first 255 paricipants will split all the balance of the contract, whereas the rest will get nothing.
348
+\medspace
349
+
350
+Because of this, if the length of the \texttt{participants} arrays is greater than 255, the integer overflows on the 256th iteration and instead of moving on to \texttt{participants[255]}, it reverts back to the first element in the array. As a result, the first 255 participants will split all the balance of the contract, whereas the rest will get nothing.
272 351
 
273 352
 \section{Code properties and automatic detection}
274 353
 
@@ -288,22 +367,22 @@ teEther~\cite{teether} employs a similar approach, but instead the authors opt t
288 367
 
289 368
 \medspace
290 369
 
291
-The fuzz-driven approach to vulnerability detection is more abstract, as general-purpose fuzzing tools generally don't have knowledge of the analysed program. For the tool SmartFuzzDriverGenerator~\cite{fuzzdrivegen}, a multitude of these fuzzing libraries can be used. The problem at hand is, however, that the technique cannot interface with a smart contract out of the box. The "glue" between fuzzer and program is called a driver, hence the name of "driver-generator".
370
+The fuzz-driven approach to vulnerability detection is more abstract, as general-purpose fuzzing tools generally don't have knowledge of the analysed program. For the tool SmartFuzzDriverGenerator~\cite{fuzzdrivegen}, a multitude of these fuzzing libraries can be used, although its application is limited to the Hyperledger Fabric permissioned blockchain. The problem at hand is, that the technique cannot interface with a smart contract out of the box. The "glue" between fuzzer and program is called a driver, hence the name of "driver-generator".
292 371
 
293 372
 \medspace
294 373
 
295
-SmartFuzzDriverGenerator aims to automatically generate such a driver by %TODO: I have no idea how it does this actually%
374
+SmartFuzzDriverGenerator aims to automatically generate such a driver by inferring the available APIs from the bytecode. There are multiple approaches to decide the order of available fuzzing steps, including a heuristic based on code complexity (i.e. nested conditions, loops, array operations, etc.), random sequences, and user-generated strategies.
296 375
 
297 376
 \medspace
298 377
 
299
-The Smartian tool~\cite{smartian} attempts to find a middle-ground between static and dynamic analysis by first transforming the EVM bytecode into control-flow facts. Based on this information, a set of seed-inputs is generated that are expected to have a high probability of yielding useable results. Should no exploit be found, the seed-inputs are then mutated in order to yield a higher code coverage. %TODO: This is probably extemely inprecise and should be re-written%
378
+The Smartian tool~\cite{smartian} attempts to find a middle-ground between static and dynamic analysis by first transforming the EVM bytecode into control-flow facts. Based on this information, a set of seed-inputs is generated that are expected to have a high probability of yielding useable results. Should no exploit be found, the seed-inputs are then mutated in order to yield a higher code coverage.
300 379
 
301 380
 \section{Exploit sketch}
302 381
 
303 382
 An exploitation sketch to Algorithm~\ref{alg:pop-incorrect} and to Algorithm~\ref{alg:multilayer-example} is available from Doughoyte~\cite{doughoyte}.
304 383
 
305 384
 \textbf{Checkpoint A}
306
-We assume that the following events have ocurred:
385
+We assume that the following events have occurred:
307 386
 \begin{enumerate}
308 387
   \item the contract MerdeToken\footnote{\url{https://github.com/Arachnid/uscc/blob/master/submissions-2017/doughoyte/MerdeToken.sol}, accessed: Oct. 30th 2023} has been created;
309 388
   \item the investor has set a withdrawal limit of 1 ether, which only they can change;
@@ -317,7 +396,7 @@ At this point, an example storage layout as per Doughoyte would be:
317 396
 
318 397
 \lstset{style=mystyle}
319 398
 \begin{algorithm}[H]
320
-	\begin{lstlisting}[language=Octave]
399
+	\begin{lstlisting}
321 400
     "storage": {
322 401
         // The address of the contract owner:
323 402
         "0000000000000000000000000000000000000000000000000000000000000000": "94b898c1a30adcff67208fd79b9e5a4d339f3cc6d2",
@@ -336,6 +415,8 @@ At this point, an example storage layout as per Doughoyte would be:
336 415
   \label{alg:exploit-checkpoint-a}
337 416
 \end{algorithm}
338 417
 
418
+\medspace
419
+
339 420
 \textbf{Checkpoint B}
340 421
 Afterwards, the malicious owner calls the vulnerable function \texttt{popBonusCode()} and the length of the array is set to the max value. This happened, because prior to the underflow, the array length was zero and, to save space, it was omitted from the memory:
341 422
 
@@ -343,7 +424,7 @@ Afterwards, the malicious owner calls the vulnerable function \texttt{popBonusCo
343 424
 
344 425
 \lstset{style=mystyle}
345 426
 \begin{algorithm}[H]
346
-	\begin{lstlisting}[language=Octave]
427
+	\begin{lstlisting}
347 428
     "storage": {
348 429
         "0000000000000000000000000000000000000000000000000000000000000000": "94b898c1a30adcff67208fd79b9e5a4d339f3cc6d2",
349 430
         "0000000000000000000000000000000000000000000000000000000000000001": "948bc7317ad44d6f34f0f0b6e3c8c7bf739ba666fa",
@@ -358,43 +439,62 @@ Afterwards, the malicious owner calls the vulnerable function \texttt{popBonusCo
358 439
   \label{alg:exploit-checkpoint-b}
359 440
 \end{algorithm}
360 441
 
442
+\medspace
443
+
361 444
 Increasing the length of the array to the maximum allowed by \texttt{uint256} was important, as this will now allow the owner to pass the requirement set in \texttt{modifyBonusCode} and still
362 445
 use the function for storage modification.
363 446
 
447
+\medspace
448
+
364 449
 \textbf{Checkpoint C} The owner is then able to use \texttt{modifyBonusCode} to increase the fixed withdraw limit to the max \texttt{uint256} value. Had the contract not have this vulnerability,
365 450
 this action should only have been possible through the \texttt{setWithdrawLimit}, which is only available to the investor.
366 451
 
452
+\medspace
453
+
367 454
 In order to overwrite the withdrawal limit, the owner must calculate the hex value to use as a first argument (index) to the function.
368 455
 Since the array \texttt{bonusCodes} underflow is defined in the sixth place in the contract storage, its length is in the fifth storage slot (counting from zero)
369
-The limit is defined at the fourth storage slot. Then, in order to manipulate the withdrawal limit, the owner must convert the address of the length to hexadecimal:\\
456
+
457
+\medspace
458
+
459
+The limit is defined at the fourth storage slot. Then, in order to manipulate the withdrawal limit, the owner must convert the address of the length to hexadecimal:
460
+
461
+\medspace
462
+
370 463
 \lstset{style=mystyle}
371 464
 \begin{algorithm}[H]
372
-	\begin{lstlisting}[language=Octave]
373
-    > web3.sha3("0x0000000000000000000000000000000000000000000000000000000000000005", { encoding: 'hex' })
374
-    "0x036b6384b5eca791c62761152d0c79bb0604c104a5fb6f4eb0703f3154bb3db0"
465
+	\begin{lstlisting}
466
+    $ web3.sha3("0x0000000000000000000000000000000000000000000000000000000000000005", { encoding: 'hex' })
467
+    > "0x036b6384b5eca791c62761152d0c79bb0604c104a5fb6f4eb0703f3154bb3db0"
375 468
   \end{lstlisting}
376 469
 	\caption{Exploit - Convert length address to hex}
377 470
   \label{alg:exploit-convert-address}
378 471
 \end{algorithm}
379 472
 
380
-and then just calculate the array index that will wrap around using the formula $2^{256} - H + 4$, where $2^{256}$ is the max \texttt{uint256} value, H is the hex obtained from the previous command and 4 is the offset of the withdrawal limit storage slot from the base of the contract. This, converted to hex, will give the owner the address to use with \texttt{modifyBonusCode}. The Perl snippet below does that:\\
473
+\medspace
474
+
475
+and then just calculate the array index that will wrap around using the formula $2^{256} - H + 4$, where $2^{256}$ is the max \texttt{uint256} value, H is the hex obtained from the previous command and 4 is the offset of the withdrawal limit storage slot from the base of the contract. This, converted to hex, will give the owner the address to use with \texttt{modifyBonusCode}. The Perl snippet below does that:
476
+
477
+\medspace
478
+
381 479
 \lstset{style=mystyle}
382 480
 \begin{algorithm}[H]
383 481
 	\begin{lstlisting}[language=Octave]
384
-    \$ perl -Mbigint -E 'say ((2**256 - 0x036b6384b5eca791c62761152d0c79bb0604c104a5fb6f4eb0703f3154bb3db0 + 4)->as_hex)'
385
-    0xfc949c7b4a13586e39d89eead2f38644f9fb3efb5a0490b14f8fc0ceab44c254
482
+    $ perl -Mbigint -E 'say ((2**256 - 0x036b6384b5eca791c62761152d0c79bb0604c104a5fb6f4eb0703f3154bb3db0 + 4)->as_hex)'
483
+    > 0xfc949c7b4a13586e39d89eead2f38644f9fb3efb5a0490b14f8fc0ceab44c254
386 484
   \end{lstlisting}
387 485
 	\caption{Exploit - Convert limit offset to address}
388 486
   \label{alg:exploit-convert-offset}
389 487
 \end{algorithm}
390 488
 
489
+\medspace
490
+
391 491
 As a result, the memory now looks like this:
392 492
 
393 493
 \medspace
394 494
 
395 495
 \lstset{style=mystyle}
396 496
 \begin{algorithm}[H]
397
-	\begin{lstlisting}[language=Octave]
497
+	\begin{lstlisting}
398 498
     "storage": {
399 499
         "0000000000000000000000000000000000000000000000000000000000000000": "94b898c1a30adcff67208fd79b9e5a4d339f3cc6d2",
400 500
         "0000000000000000000000000000000000000000000000000000000000000001": "948bc7317ad44d6f34f0f0b6e3c8c7bf739ba666fa",
@@ -409,8 +509,17 @@ As a result, the memory now looks like this:
409 509
   \label{alg:exploit-checkpoint-c}
410 510
 \end{algorithm}
411 511
 
512
+\medspace
513
+
412 514
 \textbf{Checkpoint D} The owner can now call \texttt{withdraw()} with the full amount of ether in the contract and drain it. The investor has not increased the limit at any point.
413 515
 
516
+\section{Conclusion}
517
+
518
+We presented different forms of the common weakness SWC-124: Write to Arbitrary Storage Location and how they might be detected using automated tools. We have shown how a possible exploit may be constructed, and how this can lead to the complete compromise of a smart contract's storage and control flow. We have given multiple attackable and benign code examples to illustrate this weakness. We believe this weakness to be of particular practical relevance, as it is very easy to introduce by accident, and hard to for a developer to spot without advanced knowledge of the underlying mechanisms that cause it. 
519
+
520
+As for preventative measures, we would recommend developers not to interact with low-level building blocks like an array's length value or inline assembly instructions if possible, and instead to employ standard library functions when ever available.
521
+
522
+
414 523
 \bibliography{exercise.bib}
415 524
 
416 525
 \end{document}

Loading…
Cancel
Save