Category: Research

In Research, dive into our comprehensive studies and analyses of Ethereum and additional blockchain foundations. Uncover innovative findings and stand at the forefront of blockchain research. Explore our detailed investigations into complex challenges, contributing significantly to the expansion and development of the Ethereum ecosystem. Engage with our cutting-edge research that drives blockchain technology forward.

Transient Storage in the wild: An impact study on EIP-1153
With the recent introduction of transient storage in Ethereum, the landscape of state management within the Ethereum Virtual Machine (EVM) has evolved once again. This latest development has prompted us at Dedaub to take a fresh look at how data is stored and accessed in the EVM ecosystem, as well as analyze how the new transient storage is used in real-world applications.

It’s important to note that even though transient storage has properly been integrated into the EVM, the transient modifier is still not yet available in Solidity. Therefore, all usage of transient storage is directly from the TSTORE and TLOAD opcodes using inline assembly, meaning usage is not that widespread yet, and could also be at a higher risk of vulnerability.

📢 Actually as of 9th October 2024, with the introduction of solc 0.8.28, Transient storage is fully supported! This does not invalidate any of the content in this article, but please consider this article was written at Ethereum Block Number 20129223.

In this comprehensive blog post, we will explore the strengths and limitations of each storage type. We will discuss all their appropriate use cases, and examine how the introduction of transient storage fits into the broader ecosystem of EVM data management. If you do not need a refresher of how the EVM manages state, feel free to skip to the EIP-1153 impact analysis section.

EIP-1153 | Quick refresher of data storage and access

Storage

Storage in Ethereum refers to the persistent storage a contract holds. This storage is split into 32 byte slots, with each slot having its own address ranging from 0 to 2²⁵⁶ – 1. In total, that means a contract could store potentially up to 2²⁶¹ bytes.

Of course, the EVM doesn’t track all of the bytes simultaneously. Instead, it’s treated more like a map—if a specific storage slot needs to be used, it’s loaded similar to a map, where the key is its index and the value is the 32-bytes that are being stored or accessed.

Starting with slot 0, (Solidity) will try to store static size values as compactly as possible, only moving to the next slot when the value cannot fit into the remaining space. Structs and fixed size arrays will also always start on a new slot and any following items will also start a new slot, but their values are still tightly packed.

Here are the rules as stated by the Solidity docs:
- The first item in a storage slot is stored lower-order aligned, meaning the data is stored in big endian form.
- Value types use only as many bytes as are necessary to store them.
- If a value type does not fit the remaining part of a storage slot, it will be stored in the next storage slot.
- Structs and array data always start a new slot and their items are packed tightly according to these rules.
- Items following struct or array data always start a new storage slot.
However, for mappings and dynamically size arrays, there’s no guarantee on how much space they will take up, so they cannot be stored with the rest of the fixed size values.

For dynamic arrays, the slot they would have taken up is replaced by the length of the array. Then, the rest of the array is stored like a fixed size array starting from the slot keccak256(s), where s is the original slot the array would have taken up. Dynamic arrays of arrays recursively follow this pattern, meaning an arr[0][0] would be located at keccak256(keccak256(s)), and where s is the slot the original array is stored at.

For maps, the slot remains 0, and every key-value pair is stored at keccak256(pad(key) . s), where s is the original data slot of the mapping, .is concatenation, and the key is padded to 32 bytes if it’s a value type but not if it’s a string and byte array. This address stores the value for the corresponding key, following the same rules for other storage types.

As an example, let’s look at a sample contract Storage.sol and view its storage:
```
contract Storage {
    struct SomeData {
        uint128 x;
        uint128 y;
        bytes z;
    }

    bool[8] flags;
    uint160 time;
    
    string title;
    SomeData data;
    mapping(address => uint256) balances;
    mapping(address => SomeData) userDatas;
    
    // ...
}
```
The command forge inspect Storage storage --pretty from foundry can be used to view the internal layout:
```
| Name      | Type                                        | Slot | Offset | Bytes | Contract                |
|-----------|---------------------------------------------|------|--------|-------|-------------------------|
| flags     | bool[8]                                     | 0    | 0      | 32    | src/Storage.sol:Storage |
| time      | uint160                                     | 1    | 0      | 20    | src/Storage.sol:Storage |
| title     | string                                      | 2    | 0      | 32    | src/Storage.sol:Storage |
| data      | struct Storage.SomeData                     | 3    | 0      | 64    | src/Storage.sol:Storage |
| balances  | mapping(address => uint256)                 | 5    | 0      | 32    | src/Storage.sol:Storage |
| userDatas | mapping(address => struct Storage.SomeData) | 6    | 0      | 32    | src/Storage.sol:Storage |
```
All the defined values are stored starting from slot 0 in the order they are defined.
1. First, the flags array takes up the entire first slot. Each bool only takes 1 byte to store, meaning the entire array takes 8 bytes total.
2. The uint160 time is stored in the second slot. Even though it only takes 20 bytes to store, meaning it can fit in the remaining space of the first slot, it must start on the second slot since the first slot is storing an array.
3. The string title takes up the entire third slot, since it is a dynamic data type. The slot stores the length of the string, and the actual characters of the string should be stored starting at keccak256(2).
4. Next, the entire data struct takes up 2 slots. The first slot of the struct packs both the x and y uint128 values, since they each only take 16 bytes. Then, the second slot of the struct stores the dynamic bytes value.
5. Finally, there are two mapping values, each taking up an empty slot to reserve their mapping. The actual mapping values would be stored at keccak(pad(key) . uint256(5)) or keccak(pad(key) . uint256(6)) respectively.
Here’s a diagram visualizing the storage:

If the title or z variable contain data that is longer than 31 bytes, they would instead be stored at keccak(s), as shown by the arrows. The mapping values are stored following the defined rules above for hashing the key.

Finally, storage variables can also be declared as immutable or constant. These variables don’t change over the runtime of the contract, which saves on gas fees since their calculation can be optimized out. constant variables are defined at compile-time, and the Solidity compiler will replace them with their defined value during compilation. On the other hand, immutable variables can still be defined during construction of a contract. At this point the code will automatically replace all references to the value with the one that was defined.

Memory

Unlike storage, memory does not persist between transactions, and all memory values are discarded at the end of the call. Since memory reads have a fixed size of 32 bytes, it aligns every single new value to its own chunk. So while uint8[16] nums might only be one 32-byte word when stored in storage, it will take up sixteen 32-byte words in memory. The same splitting also happens to structs, regardless of how they are defined.

For data types like bytes or strings, their variables need to be differentiated between memory pointers or storage pointers, using the memory or storage keyword respectively.

Mappings and dynamic arrays do not exist in memory, since constantly resizing memory is very inefficient and expensive. Though you can allocate arrays with a fixed size using new <type>[](size), you cannot edit the sizes of these arrays like you can with storage arrays using .push and .pop .

Finally, memory optimization is very important, since the gas cost for memory scales quadratically with size as memory expands, rather than linearly.

Stack

Like memory, stack data only exists for the current execution. The stack is very simple, being just a list of 32-byte elements that are stored sequentially one after another. It is modified using POP, PUSH, DUP, and SWAP instructions, much like stacks in standard executables. Currently, the stack only stores up to 1024 values.

Most actual calculations are done on the stack. For example, arithmetic opcodes such as ADD or MUL pop two values from the stack, then push the result with the binary operation onto the stack.

Calldata

Calldata is similar to memory and stack data in that it only exists within the context of one function call. Like memory, all values must also be padded to 32 bytes. However, unlike memory, which is allocated during contract interactions, calldata stores the read-only arguments that are passed in from external sources, like an EOA or another smart contract. It is important to note that if you want to edit the values passed in from calldata, you must copy them to memory first.

Calldata is passed in with the rest of the data during the transaction, so it must be packed properly according to the specified ABI of the function that is being called.

Transient Storage

Transient Storage is a fairly new addition to the EVM, with Solidity only supporting the opcodes starting 2024, with the proper language implementation expected to arrive in the near future. It is meant to serve as an efficient key-value mapping that exists during the context of an entire transaction, and its opcodes, TSTORE and TLOAD. It always takes 100 gas each, making it much more gas efficient than regular storage.

The specialty of transient storage is that it persists through call contexts. This is perfect for scenarios like reentrancy guards that can set a flag in transient storage, then check if that flag has already been set throughout the context of an entire transaction. Then, at the end of the entire transaction, the guard will be wiped completely and can be used as normal in future transactions.

Despite its transient nature, it is important to note that this storage is still part of the Ethereum state. As such, it must adhere to similar rules and constraints of those of regular storage. For instance, in a STATICCALL context, which prohibits state modifications, transient storage cannot be altered, meaning only the TLOAD opcode is allowed and not TSTORE.

EIP-1153 impact analysis

Since transient storage is a relatively recent feature we were able to be comprehensive in inspecting all cases of how it has been used as of Ethereum block number 20129223. We found that from the ~250 deployed contracts containing or having libraries containingTSTORE or TLOAD opcodes, there were ~180 unique source files, meaning over 60 of these deployed contracts were duplicates deployed cross-chain.

Here is the recorded distribution of the usage of transient storage in these ~190 contracts:

Out of the around 190 unique contracts on chain that use this feature, we were able to differentiate them into 6 general categories:
1. First and foremost, over 50% of the usage of transient storage is on reentrancy guards. This makes sense, as reentrancy protection is the perfect use case for transient storage and is also very easy to implement, with a simple one possibly looking like:
```
modifier ReentrancyGuard {
    assembly {
		    // If the guard has been set, there is re-entrancy, so revert
        if tload(0) { revert(0, 0) } 
        // Otherwise, set the guard
        tstore(0, 1)
    }
    _;
    // Unlocks the guard, making the pattern composable.
    // After the function exits, it can be called again, even in the same transaction.
    assembly {
        tstore(0, 0)
    }
}
```
1. On the other hand, only 3.6% of the contracts used this pattern as an entrancy lock, locking the contract state between transactions to ensure that certain functions could only be called after calling other functions. Here’s a short example.
```
// keccak256("entrancy.slot")
uint256 constant ENTRANCY_SLOT = 0x53/*...*/15;

function enter() {
    uint256 entrancy = 0;
    assembly {
        entrancy := tload(ENTRANCY_SLOT)
    }
    if (entrancy != 0) {
				revert("Already entered");
    }

    entrancy = 1;
    assembly {
        tstore(ENTRANCY_SLOT, entrancy)
    }
}

function withdraw() {
    uint256 entrancy = 0;
    assembly {
        entrancy := tload(ENTRANCY_SLOT)
    }

    if (entrancy == 0) {
        revert("Not entered yet");
    }

    // ...
}
```
1. Next, around 6% of the contracts used transient storage to preserve contract context for callback functions or cross-chain transactions. This was mostly on Bridge contracts, like this one here.
2. 8.3% of the contracts used transient storage to keep a temporary copy of the contract state to verify that certain actions are authorized. For example, this contract by OpenSea temporarily stores an authorized operator, specific tokens, and amounts related to those tokens to validate that all transfers happen as they should.
3. A bit less than 9% of the contracts used transient storage for their own specialized purposes. For example, an airdropping contract utilizes tstore as a hashmap to track and manage eligible recipients within the transaction context.
4. 20% of contracts although had no transient storage opcodes in the bytecode, contained functions that utilised transient storage in the referenced libraries. Most of these libraries are openzeppelin internals such as their implementation of ERC1967 (see StorageSlot).
The introduction of transient storage marks a significant evolution in the EVM’s data management capabilities. Our analysis at Dedaub reveals that while it’s still in its early stages of adoption, transient storage is already making a notable impact, particularly in smart contract security and efficiency.

The introduction of transient storage marks a significant evolution in the EVM’s data management capabilities. Our analysis at Dedaub reveals that while it is still in its early stages of adoption, transient storage is already making a notable impact, particularly in smart contract security and efficiency.

Key takeaways from our analysis of transient storage usage include:
- Reentrancy guards dominate the current use cases, accounting for over 50% of transient storage implementations. This highlights the immediate value developers see in using transient storage for cross-function state management within a transaction.
- Beyond security, innovative developers are finding creative ways to leverage transient storage for storing contextual information and managing contexts throughout complex transactions.
- The adoption of transient storage, while still limited, shows promise for improving gas efficiency and simplifying certain smart contract patterns.
Gas efficiency improvements

In our analysis of transient storage usage, we also evaluated its gas efficiency compared to regular storage. To do this, we collected the last 100 transactions for each of the contracts analyzed. For each transaction, we obtained its execution trace and used a Python script to simulate gas costs by replacing TSTORE operations with SSTORE under the same conditions (including cold load penalties and other storage rules).

The results were impressive: across all use cases, using transient storage led to an average gas savings of 91.59% compared to regular storage operations. Below, you can find a more detailed graph that shows gas savings per category. Interesting to note that in the case of the Specialized Functionality, gas savings of around 98.7% were recorded. This is because of the airdropping contract mentioned above, and in this case memory might have been a more adequate comparison.

Conclusion

As the Ethereum ecosystem continues to evolve, we expect to see more diverse and sophisticated uses of transient storage emerge. Its unique properties – persisting across internal calls within a transaction while being more gas-efficient than regular storage – open up new possibilities for optimizing smart contract design and execution.

Below we are publishing the dataset and scripts that were used for the above post.

dump_transient_traces.zip
October 22, 2024
EIP-3074 Impact Study
Pectra’s EIP-3074, and its Impact on Deployed Smart Contracts

Introduction

Ethereum’s end-user experience (UX) is about to be significantly enhanced with the introduction of EIP-3074, which will be part of the upcoming Pectra update. This proposal intends to improve wallets’ functionality by directly enabling more complex operations similar to smart contracts within traditional wallet architectures. It improves blockchain user UX and solves problems like transaction bundling and sponsored transactions.

For a study commissioned by the Ethereum Foundation, Dedaub identified the potential impacts of EIP-3074 on all known deployed smart contracts as of the date of the study. The results of our analysis are becoming increasingly relevant as we approach the implementation EIP-3074. You can read the original study here.

According to our research, we have found that EIP-3074 can bypass many access control predicates involving comparisons between the caller (msg.sender) and transaction originator (tx.origin). These have been used in the past to protect against flashloan attacks, and in rare cases, reentrancy. Although the former comparisons were previously considered unsafe, our findings suggest that upgraded security measures against such attacks need to be accelerated. However, it is worth noting that the use of these access control predicates were found to be rare. In addition, since our study in 2021, multiple other mechanisms such as MEV bundling have further rendered comparisons to tx.origin even more unsafe, meaning that modern contracts are less susceptible to negative security impacts.

Dedaub’s EIP-3074 Impact Study: A Look Back

In May 2021, Dedaub conducted a study to assess the impact of Ethereum Improvement Proposal (EIP) 3074 and evaluate its potential effect on Ethereum’s ecosystem. Our researchers built custom static analysis pipelines, reviewed the source code and bytecode of deployed contracts, and obtained insights from developer interviews to understand the proposal’s possible outcomes.

The team was concerned that non-standard checks (e.g., reentrancy checks) used `msg.sender == tx.origin`. Although EIP-3074 could make new attacks easier to execute, it was found that it does not create significant new attack vectors to the existing code already using checks on expressions such as `msg.sender == tx.origin`. Although the issue has the potential to impact several thousand deployed contracts (around 1.85% of the entire smart contract corpus on Ethereum as of May 2021), the developer community is aware of the potential risks and most developers we spoke to are ready to adapt with sufficient warning.

Our study utilized our state-of-the-art decompilation and static program analysis frameworks, which are part of the toolchains available in the Dedaub Security Suite. These techniques were critical in identifying specific contracts and scenarios were EIP-3074 could change the attack surface.

Although it is important to acknowledge that the results are subject to interpretation, we considered the impact of EIP-3074 to be “moderate but manageable” as of May 2021, and even more manageable today.

About EIP-3074 and the Pectra Update

The upcoming Pectra upgrade of Ethereum is set to change the functionality of Ethereum wallets. The proposal will introduce two new advanced opcodes (AUTH and AUTHCALL), allowing traditional wallets such as MetaMask to operate with functionalities similar to those of smart contracts. The proposed enhancement will empower wallets to authorize transactions on behalf of users, thereby streamlining interactions and boosting security.

The upcoming Pectra upgrade, which is planned to be integrated into the Ethereum network later this year, aims to significantly streamline the user experience. EIP-3074 will play a crucial role in this upgrade. EIP-3074 introduces two new Ethereum opcodes, AUTH and AUTHCALL, to improve the control of smart contract transactions and allow more flexible delegation of transaction execution. Here’s a summary of their functionality:

AUTH:
- AUTH is an opcode that takes a single secp256k1 signature as input.
- The purpose of AUTH is to recover the address of an Ethereum account that has signed the input data. It stores this address in the EVM’s context.
- This allows smart contracts to securely authenticate a message signed by a specific account without requiring direct transaction signatures.
Effectively, AUTH authenticates that the transaction is being executed on behalf of the specified account.

AUTHCALL:
- AUTHCALL builds upon AUTH. It is an opcode that works similarly to a normal CALL but leverages the address authenticated by AUTH.
- This opcode allows executing a call from the authenticated address established by AUTH.
Essentially, it lets the contract execute transactions as if it were that authenticated account, thus enabling secure and flexible delegation of transaction execution.

These opcodes can be used to build more complex smart contract architectures where you can delegate transaction signing to another entity or allow relayers to execute transactions on behalf of others securely. This improves flexibility and can facilitate more seamless smart contract interactions, especially in systems requiring more complex transaction flows.

However, it’s crucial to handle these opcodes carefully to maintain the security of the delegation scheme and avoid unintended transaction execution.

Applications Enabled by EIP-3074

A few examples of applications that EIP-3074 will enable or significantly facilitate include:

Meta-Transactions: Users can sign a message off-chain, and relayers can use AUTH to verify the signature and AUTHCALL to execute the transaction securely.

Smart Contract Wallets: Instead of executing calls through multiple layers of contract indirection, smart contracts can act directly as users using AUTHCALL.

Delegated Governance: Users can delegate their voting power by signing off-chain messages, which governance contracts can then verify and act upon using AUTHCALL.

Subscription and Recurring Payments: Recurring transactions can be authenticated off-chain, and service providers can execute authorized payments using AUTHCALL.

Access Control and Delegation: A master account can securely delegate access to sub-accounts via signed messages, with AUTHCALL enforcing the permissions.
May 3, 2024
Ethereum improvement proposal 4788 | EIP-4877 Summary
Dedaub was commissioned by the Ethereum Foundation to perform a security audit of the bytecode of a smart contract that was introduced to the EIP-4877 in a recent change, enabling the on-chain storing and accessing of the beacon block roots of recent blocks.

In this blog post, titled “EIP-4877 summary,” we highlight key insights from the audit. You can access the complete report here.

The audited contract uses the block’s timestamp as a key for their parent beacon blocks’ roots. To bind the contract’s storage footprint while retaining accurate information, a set of two ring buffers are used (using a HISTORY_BUFFER_LENGTH with a value of 98304):
1. The first one stores the timestamp (i.e. the key) and is used to ensure that the result for the provided timestamp is the one that is currently stored on-chain. Its value will be stored at storage location timestamp % HISTORY_BUFFER_LENGTH.
2. The second one is used to store the beacon root chain value for the timestamp. Ιts value will be stored at storage location HISTORY_BUFFER_LENGTH + timestamp % HISTORY_BUFFER_LENGTH.
The audited contract implements two methods, set() and get(). As the contract does not adhere to the contract ABI specification, the method to be executed is chosen based on the contract’s caller; if called by the special address 0xfffffffffffffffffffffffffffffffffffffffe, the set() function is called, while every other address the calls the get() function. Both methods accept the first 32 bytes of the call’s calldata as their arguments.

The highest severity vulnerability found during this audit is one where calling the get() function. More specifically, if this is called with a value of zero it will not fail and return the zero value back.
```
// get()
require(msg.data.length == 32);
require(calldata0_32 == STORAGE[calldata0_32 % 0x18000]);
return STORAGE[0x18000 + calldata0_32 % 0x18000];
```
EIP-4877 Summary | Main body of the get() function

Although this does not affect the contract’s functionality for valid timestamps it can potentially lead to misuse, and funds stolen in projects that rely on this root to exist and valid. Therefore we suggested adding a special case for the zero value, in the get() function or invalidating it by storing a value in the 0th storage slot during the contract’s construction.

You can read the full audit report here.
October 7, 2023
Ethereum Study – Rlp to Ssz Mpt Commitment Migration

The Ethereum Foundation commissioned our team to examine the potential impact of Ethereum Improvement Proposals (EIPs) 6404 and 6466. These EIPs propose the modification of Merkle-Patricia Trie (MPT) commitments for transactions and receipts, respectively. Importantly, this entails a change in the serialization algorithm, from Recursive Length Prefix (RLP) format to the Simple Serialize (SSZ) format for the Receipts and Transactions containers. In turn, this changes the Receipts Root and Transactions Root fields in the execution layer headers.

A primary concern is that this transition could disrupt contracts that rely on RLP for proofs on data committed to the Ethereum mainnet. These contracts may include critical parts of decentralized bridges, which generate proofs about some log that was emitted in historical transactions.

EIPs 6404 and 6466 | This research seeks to quantify and qualify the extent of potential disruption caused by these changes. Identifying the specific on-chain patterns that verify commitments in this manner represents a significant challenge, necessitating a semi-automated examination of all smart contracts deployed on the Ethereum network, together with their recent behavior. The study also attempts to identify which projects these contracts are part of, and whether actions can be taken, on-chain (such as upgrading) or off-chain (such as modifying their respective oracles) to limit the impact of these changes.

For the proposed EIPs, we were able to measure the extent of the impact of these changes. The effects are observed on a handful of known projects, all of which are cross-chain bridges.

Notably, many other protocols that do employ RLP functionality are not affected. For instance the Optimism and Polygon bridges use RLP operations for inclusion proofs when bridging from L2 networks back to Ethereum, and, thus, are not affected by the Ethereum encoding of transactions.

Project Name Website Estimated Impact
zkBridge https://zkbridge.com Moderate
LayerZero https://layerzero.network/ Moderate
Telepathy https://docs.telepathy.xyz/ Moderate

Finally, an interesting result of our study is that out of the two proposed EIPs, only EIP-6466 (Receipts Root EIP) was observed to have an impact on the inspected protocols. This makes sense as log-inclusion proofs are probably the most common way to conduct cross-chain message passing.

Read the rest of the study here.

July 6, 2023
EIP-4758 and EIP-6780 | Removal of Selfdestruct

Dedaub was commissioned by the Ethereum Foundation to perform an impact study of Ethereum Improvement Proposals (EIPs) 4758 and 6780 on existing contracts. EIP-4758 proposes to deactivate SELFDESTRUCT by changing it to SENDALL, which recovers all funds (in ETH) to the beneficiary without deleting any code or storage. On the other hand, EIP-6780 modifies SELFDESTRUCT to work only in the same transaction in which the contract was created, while in all other cases it recovers all funds but does not delete any other account data.

The aim of this study is (i) to help the Ethereum community decide whether to implement, based on the impact of these changes to the ecosystem, EIP-4758 or EIP-6780. In either case we also aimed to (ii) find out which projects are affected and by how much. To evaluate the impact of these proposed changes, we performed comprehensive queries over past on-chain behaviors of smart contacts and queries on code and bytecode of deployed contracts; inspected code manually; checked balances, approvals, and contract proxying state; and informally interviewed developers.

The study found that a small number of known projects and many smart contracts, mainly involved in Miner Extractable Value (MEV) bot networks, would be affected. Quantitatively, over 98% of SELFDESTRUCT-CREATE2 pairs in known contracts would remain unaffected if EIP-6780 is implemented, while the impact of EIP-4758 is less certain. Metamorphic contracts used for upgrades were found to be rare. If implemented today, EIP-4758 could affect some functionalities of certain projects, including AxelarNetwork, Pine Finance, Revest, and JPEGd (with high impact), and Sorbet Finance, Celer, Gelato, Ricmoo’s Wisps, Chainhop Protocol (with low impact), and Thousand Ether Homepage (with moderate impact). However, most of these projects could be upgraded in time for a deployment of the proposed EIPs.

Based on this, we judged the impact of EIP-4758 and EIP-6780 to be manageable and could be a net positive due to the simplification of Ethereum Clients’ implementations, especially if EIP-6780 is selected.

The study used dynamic analysis and static program analysis, along with smart contract inspection and transaction debugging.

Read more

May 30, 2023
Elipmoc: Advanced Decompilation of Ethereum SmartContracts

NEVILLE GRECH, University of Malta, Malta and Dedaub Ltd
SIFIS LAGOUVARDOS, University of Athens, Greece and Dedaub Ltd
ILIAS TSATIRIS, University of Athens, Greece and Dedaub Ltd
YANNIS SMARAGDAKIS, University of Athens, Greece and Dedaub Ltd

Smart contracts on the Ethereum blockchain greatly benefit from cutting-edge analysis techniques and pose significant challenges. A primary challenge is the extremely low-level representation of deployed contracts. We present Elipmoc, a decompiler for the next generation of smart contract analyses. Elipmoc is an evolution of Gigahorse, the top research decompiler, dramatically improving over it and over other state-of-the-art tools, by employing several high-precision techniques and making them scalable. Among these techniques are a new kind of context sensitivity (termed “transactional sensitivity”) that provides a more effective static abstraction of distinct dynamic executions; a path-sensitive (yet scalable, through path merging) algorithm for inference of function arguments and returns; and a fully context sensitive private function reconstruction process. As a result, smart contract security analyses and reverse-engineering tools built on top of Elipmoc achieve high scalability, precision and completeness. Elipmoc improves over all notable past decompilers, including its predecessor, Gigahorse, and the state-of-the-art industrial tool, Panoramix, integrated into the primary Ethereum blockchain explorer, Etherscan. Elipmoc produces decompiled contracts with fully resolved operands at a rate of 99.5% (compared to 62.8% for Gigahorse), and achieves much higher completeness in code decompilation than Panoramix—e.g., up to 67% more coverage of external call statements—while being over 5x faster. Elipmoc has been the enabler for recent (independent) discoveries of several exploitable vulnerabilities on popular protocols, over funds in the many millions of dollars.

Additional Key Words and Phrases: Program Analysis, Smart Contracts, Decompilation, Datalog, Security, Ethereum, Blockchain

1 INTRODUCTION
Decentralized financial applications, built using smart contracts running on programmable blockchains (most notably Ethereum), are starting to rival traditional financial systems. Therefore coding or logical errors in smart contracts can have large financial implications. This makes smart contracts primary targets for automated analysis and verification tasks.
More specifically, the analysis of smart contracts as-deployed, i.e., by taking their binary form as input, is attractive for several reasons. First, it offers significant generality: many deployed smart contracts have no publicly released source code. Security analysts routinely need to inspect such contracts (e.g., bots) to determine their functionality. Second, analyzing contracts at the bytecode level offers a uniform platform for analysis, not distinguishing between source languages and source language versions. This is a major factor, given that (the dominant Ethereum language)

Authors’ addresses: Neville Grech, University of Malta, Malta and Dedaub Ltd, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece and Dedaub Ltd, sifis.lag@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece and Dedaub Ltd, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, University of Athens, Greece and Dedaub Ltd, smaragd@di.uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 2475-1421/2022/4-ART77 https://doi.org/10.1145/3527321 Proc. ACM Program. Lang., Vol. 6, No. OOPSLA1, Article 77. Publication date: April 2022. 77:2 Neville Grech, Sifis Lagouvardos, Ilias Tsatiris, and Yannis Smaragdakis Solidity does not maintain source compatibility across versions. Source-level tools very quickly become obsolete.

Bytecode decompilation offers by far the most general substrate for automated smart contract analyses. Decompilation of the Ethereum Virtual Machine (EVM) bytecode is an open problem for the security community. Existing approaches lack in completeness, precision, or scalability. For in-stance, the most-used decompiler in practice, etherscan.io’s Panoramix [Kolinko and Palkeo 2020], decompiles complex contracts only partially, with much of the low-level code often not reflected in the decompiled version. Past academic research tools, such as the Gigahorse decompiler [Grech et al. 2019a], sacrifice precision and occasionally even scalabily. Lower precision results in low-quality decompilation, for instance, operations with unknown operands, or jumps with many infeasible targets.

To illustrate such shortcomings, our experiments show that these leading past decompilers manage to decompile under 20% (Panoramix: 18.2%, Gigahorse: 19.8%) of Ethereum contracts of large size (over 20KB). Technically, the low-level nature of the EVM poses a challenge for any decompilation effort. The EVM uses a single stack for calls and local operations, without any guarantees about its well-formedness. The only control flow operators are unstructured jumps. All calls, returns, loops, and conditionals are implemented as jumps to whichever address is currently at the top of the stack. Compilers producing EVM bytecode often perform aggressive optimization, since instructions have a monetary cost to execute. A major obfuscating factor, for instance, is the aggressive merging of identical instruction sequences belonging to different internal, a.k.a. private, functions.

Control and data flow are, as a result, very hard to discern without sophisticated modeling techniques. In this paper, we introduce Elipmoc (“compile”, backwards), a decompiler for Ethereum smart con-tracts. Elipmoc advances the state of the art in EVM bytecode decompilation, offering significantly increased precision and completeness. Elipmoc is the substrate of a successful analysis framework that has flagged numerous exploitable vulnerabilities on contracts with or without source (but with millions of dollars in locked value). All analyses operate over the three-address code exported by Elipmoc. We have made several vulnerability disclosures, some of which resulted in major rescue efforts [Dedaub 2021b,c,d,f; Michales, Jonah 2021; Primitive Finance 2021].

In addition, the Elipmoc team has been commissioned to perform three separate studies for the Ethereum Foundation, for the assessment of the impact of Ethereum Improvement Proposals EIP-1884, EIP-3074, and a future EIP that proposes a rearchitecting of storage gas cost metering. The Elipmoc decompiler allows answering questions such as “what will be the impact of change-X on the entire set of currently deployed contracts?” and is becoming a crucial tool for the evolution of the Ethereum blockchain and many of its applications.
In technical terms,

Elipmoc makes the following contributions:

● Transactional Context Sensitivity. Elipmoc proposes an effective form of context sensitivity for smart contract execution. Context sensitivity offers a way to condense actual executions into short static signatures. Devising good signatures is crucial for both precision and scalability throughout the decompiler. Elipmoc’s transactional context sensitivity is based on insights about EVM low-level control flow (esp. continuation-based optimizations) and overall execution.

● Scalable, context & path-sensitive function reconstruction. Since compilers tend to reuse EVM bytecode sequences for the benefit of multiple functions, a direct inference of private functions is not straightforward. Additionally, high-level control flow is compiled away to the point of near-irreversibility, using continuation-passing-style patterns. We address such issues with a static analysis maintaining the current contents of a continuation stack, and a path-sensitive analysis to determine with high precision the maximum number of stack elements consumed, i.e.,

Access the full document

April 1, 2022
Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts

YANNIS SMARAGDAKIS, University of Athens, Greece
NEVILLE GRECH, University of Malta, Malta
SIFIS LAGOUVARDOS, University of Athens, Greece
KONSTANTINOS TRIANTAFYLLOU, University of Athens, Greece
ILIAS TSATIRIS, University of Athens, Greece

We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis.

The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision. We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value. In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures. CCS Concepts:

• Theory of computation → Program analysis;
• Software and its engineering → General programming languages;
• Security and privacy → Software and application security.

Additional Key Words and Phrases: Program Analysis, Smart Contracts, Security, Ethereum, Blockchain ACM Reference Format: Yannis Smaragdakis, Neville Grech, Sifis Lagouvardos, Konstantinos Triantafyllou, and Ilias Tsatiris. 2021.

Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts. Proc. ACM Program. Lang. 5, OOPSLA, Article 163 (October 2021), 30 pages. https://doi.org/10.1145/3485540 Authors’ addresses: Yannis Smaragdakis, University of Athens, Greece, smaragd@di.uoa.gr; Neville Grech, University of Malta, Malta, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece, sifis.lag@di.uoa.gr; Konstantinos Triantafyllou, University of Athens, Greece, kotriant@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece, i.tsatiris@di. uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

1 INTRODUCTION
Static analysis is distinguished among program analysis techniques (e.g., testing [Czech et al. 2016; Meyer 2008], model checking [Clarke et al. 1986; Jhala and Majumdar 2009], or symbolic execution [Baldoni et al. 2018; King 1976]) by its emphasis on completeness, i.e., its attempt to model all (or as many as possible) program behaviors. To achieve this goal under a realistic time budget, static analysis often has to sacrifice some precision: the analysis may consider combinations of values that may never appear in a real execution. Maintaining high precision, while achieving completeness and scalability, remains a formidable challenge. In this work, we present an analysis that attempts to meet this challenge, under realistic domain assumptions. The analysis maintains high precision (closely analogous to that of model checking or full program execution) while achieving very high completeness, as measured in terms of coverage of program behaviors.

For conciseness purposes, we give to the analysis architecture the name symvalic analysis, for “symbolic+value-flow static analysis”. A symvalic analysis computes for each program variable a set of possible concrete values as well as symbolic expressions. The analysis is path-sensitive: values propagate to a variable at a program point only if they satisfy (modulo natural analysis over-approximation) the conditions under which the program point would be reachable. In order to satisfy conditions, the analysis needs to solve equalities and inequalities over both concrete and symbolic values. To do this, the analysis appeals to a symbolic solver and simplifier tightly integrated with the analysis core. At the same time, most of the analysis coverage of the program is done inexpensively, based on concrete values—e.g., the analysis tries selected small integers, large integers, and constants from the program text as values of variables at every external entry point.

The above description is perhaps reminiscent of dynamic-symbolic (a.k.a. concolic) execution [Cadar et al. 2008; Godefroid 2007; Godefroid et al. 2005; Sen and Agha 2006; Sen et al. 2005; Tillmann and de Halleux 2008; Tillmann and Schulte 2006]. Some of the insights are indeed similar. The use of concrete values whenever possible (and, conversely, the appeal to symbolic solving selectively) is responsible for the symvalic analysis scalability, much as it has been for dynamic-symbolic execution. However, symvalic analysis also has significant differences from dynamic-symbolic execution. Most notably, it is a static analysis and not full program execution. To model a program statement, symvalic analysis does not need to create a full context of values for every other program variable and (heap) storage location. Indeed, symvalic analysis treats concrete values highly efficiently, manipulating them using set-at-a- ime reasoning, instead of individually. This is the value-flow aspect of symvalic analysis: sets of values are propagated using mass operations (table joins) using a standard fixpoint (Datalog) analysis engine. We apply the idea of symvalic analysis to Ethereum smart contracts [Wood 2014]: a domain where high-precision, high-completeness analyses are in demand. Smart contracts are programs that handle monetary assets whose value occasionally rises to the many millions of dollars.

Tens of security researchers pore over the code of these contracts daily. Discovering an important bug is a source of both fame and reward. Conventional analysis techniques are rarely effective for high-value contracts: Perez and Livshits [2021] recently find that only 0.27% of the funds reported vulnerable by some of the most prominent research tools have truly been subsequently exploited. Simply put, program analysis can help with vulnerabilities in contracts that have not received much scrutiny, but rarely helps with high-value vulnerabilities that have not been found through other means. Partly addressing this need, symvalic analysis has already enjoyed significant success in the analysis of smart contracts, with several high-value vulnerabilities discovered. In brief, our work claims the following contributions:

Access the full document

October 1, 2021
Precise Static Modeling of Ethereum “Memory”

SIFIS LAGOUVARDOS, University of Athens, Greece
NEVILLE GRECH, University of Athens, Greece
ILIAS TSATIRIS, University of Athens, Greece
YANNIS SMARAGDAKIS, University of Athens, Greece

Static analysis of smart contracts as-deployed on the Ethereum blockchain has received much recent attention. However, high-precision analyses currently face significant challenges when dealing with the Ethereum VM (EVM) execution model. A major such challenge is the modeling of low-level, transient “memory” (as opposed to persistent, on-blockchain “storage”) that smart contracts employ. Statically understanding the usage patterns of memory is non-trivial, due to the dynamic allocation nature of in-memory buffers. We offer an analysis that models EVM memory, recovering high-level concepts (e.g., arrays, buffers, call arguments) via deep modeling of the flow of values. Our analysis opens the door to Ethereum static analyses with drastically increased precision. One such analysis detects the extraction of ERC20 tokens by unauthorized users. For another practical vulnerability (redundant calls, possibly used as an attack vector), our memory modeling yields analysis precision of 89%, compared to 16% for a state-of-the-art tool without precise memory modeling. Additionally, precise memory modeling enables the static computation of a contract’s gas cost.

This gas-cost analysis has recently been instrumental in the evaluation of the impact of the EIP-1884 repricing (in terms of gas costs) of EVM operations, leading to a reward and significant publicity from the Ethereum Foundation. CCS Concepts: • Software and its engineering → Formal software verification; • Theory of computation → Program analysis. Additional Key Words and Phrases: ethereum, EVM, static analysis ACM Reference Format:
Sifis Lagouvardos, Neville Grech, Ilias Tsatiris, and Yannis Smaragdakis. 2020. Precise Static Modeling of Ethereum “Memory”. Proc. ACM Program. Lang. 4, OOPSLA, Article 190 (November 2020), 27 pages. https: //doi.org/10.1145/3428258

1 INTRODUCTION
The Ethereum blockchain has enabled the management of digital assets via unsupervised autonomous agents called smart contracts. Smart contracts are among the computer programs with the most dire needs of high correctness assurance, due to their managing of high-value assets, as well as their public and immutable nature. Therefore, static analysis for Ethereum smart contracts has captured the attention of the research community in recent years [Albert et al. 2018; Feist et al. 2019; Grech et al. 2018; Mueller 2018; Tsankov et al. 2018].

The first generation of Ethereum Authors’ addresses: Sifis Lagouvardos, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, sifis.lag@di.uoa.gr; Neville Grech, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, me@nevillegrech.com; Ilias Tsatiris, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, yannis@smaragd.org. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2020 Copyright held by the owner/author(s). 2475- 421/2020/11-ART190 https://doi.org/10.1145/3428258

Static analyses have already exhibited significant successes.1 For instance, Securify [Tsankov et al. 2018] has been the basis for a leading company in the Ethereum security space, with many tens of commercial audits performed. Most static analysis tools for Ethereum operate at the binary level of contracts, as-deployed on the blockchain. This ensures that the analysis operates on all contracts in existence, regardless of whether source code is available. (Source code is not always present but not negligible either: it is available for under 25% of deployed contracts, yet for more than half of the high-value contracts.) Furthermore, operating on low-level binaries offers source-language and languageversion independence, completeness in the presence of inline assembly (which is common), and, perhaps most importantly, uniform treatment of complex language features: as analysis authors often argue, innocuous-looking source-level expressions can incur looping behavior or implicit overflow [Grech et al. 2018].

Operating at the binary level necessitates contract decompilation [eth [n. d.]; Brent et al. 2018; Grech et al. 2019; Kolinko 2018] in order to recover high-level information from the very-low-level Ethereum VM (EVM) bytecode format. This decompilation effort is non-trivial: the EVM is a stack machine with no structured information (no types or functions). Control- low (i.e., calls, returns, conditionals) is implemented as forward jumps over run-time values obtained from the stack, hence even producing a control-flow graph requires a deep static analysis [Grech et al. 2018]. Despite these challenges, sophisticated decompilers mostly succeed in recovering high-level control flow and have been the basis of the most successful static analyses. Despite the relative success of Ethereum decompilers, some low-level aspects of the deployed contract remain unaddressed. The most major such aspect is the precise modeling of transient EVM memory. “Memory” in the context of the EVM (and of this paper) refers to a data store that is transient and transaction-private, used by the runtime primarily to store values whose size is statically unknown—e.g., arrays, strings, or encoded buffers. (Memory is to be contrasted with storage: the on-blockchain persistent value store of an Ethereum smart contract.)

Memory is crucial in the execution of smart contracts. It is used for many cryptographic hash operations (SHA3 is a native instruction, operating over arbitrary-length buffers), for the arguments of external calls (i.e., calls to other contracts), for logging operations, and for returns from a public function. Crucially, any complex mapping data structure (i.e., the most common Ethereum data structures) stores and retrieves information (from storage) after hashing keys in memory buffers. Modeling EVM memory is unlike other memory management settings. There is no speed premium for locality, yet the compiler attempts to keep memory tightly packed because the contract’s execution cost (in terms of Ethereum gas consumed for memory-related operations) depends on the highest initialized memory address. For the vast majority of smart contracts, written in the Solidity language, the compiler uses a simple strategy: every allocated buffer is written after the end of the last one, updating a “free-memory” pointer. This memory management policy is a low-level aspect, introduced entirely during the compilation process. At the source-code level, memory is implicit: the values that end up stored in memory merely have dynamic-length array or string types. The goal of a precise modeling of EVM memory is to recover such source-level information (expressed in terms of arrays or strings) from the low-level address manipulation code that the compiler produces. Generally, our work makes the following contributions:

Access the full document

November 4, 2020
MadMax: Analyzing the Out-of-Gas World of Smart Contracts
Abstract
Ethereum is a distributed blockchain platform, serving as an ecosystem for smart contracts: full-fledged intercommunicating programs that capture the transaction logic of an account. A gas limit caps the execution of an Ethereum smart contract: instructions, when executed, consume gas, and the execution proceeds as long as gas is available. Gas-focused vulnerabilities permit an attacker to force key contract functionality to run out of gas—effectively performing a permanent denial-of-service attack on the contract. Such vulnerabilities are among the hardest for programmers to protect against, as out-of-gas behavior may be uncommon in nonattack scenarios and reasoning about these vulnerabilities is nontrivial.

In this paper, we identify gas-focused vulnerabilities and present MadMax: a static program analysis technique that automatically detects gas-focused vulnerabilities with very high confidence. MadMax combines a smart contract decompiler and semantic queries in Datalog. Our approach captures high-level program modeling concepts (such as “dynamic data structure storage” and “safely resumable loops”) and delivers high precision and scalability. MadMax analyzes the entirety of smart contracts in the Ethereum blockchain in just 10 hours and flags vulnerabilities in contracts with a monetary value in billions of dollars. Manual inspection of a sample of flagged contracts shows that 81% of the sampled warnings do indeed lead to vulnerabilities.

1. INTRODUCTION
Ethereum is a decentralized blockchain platform that can execute arbitrarily-expressive computational smart contracts. A smart contract can capture virtually any complex interaction, such as responding to communication from other accounts and dispensing or accepting funds. The possibilities for such programmable logic are endless. It may encode a payoff schedule, investment assumptions, interest policy, conditional trading directives, trade or payment agreements, and complex pricing.

Virtually any transactional multiparty interaction is expressible without a need for intermediaries or third-party trust. Smart contracts typically handle transactions in Ether, which is the native cryptocurrency of the Ethereum blockchain with a current market capitalization in tens of billions of dollars. Smart contracts (as opposed to noncomputational “wallets”) hold a considerable portion of the total Ether available in circulation, which makes them ripe targets for attackers. Hence, developers and auditors have a strong incentive to make extensive use of various tools and programming techniques that minimize the risk of their contract being attacked.

Analysis and verification of smart contracts are, therefore, high-value tasks, possibly more so than in any other application domain. The combination of monetary value and public availability makes the early detection of vulnerabilities a task of paramount importance. A broad family of contract vulnerabilities concerns out-ofgas behavior. Gas is the fuel of computation in Ethereum. Due to the massively replicated execution platform, wasting the resources of others is prevented by charging users for running a contract. Each executed instruction costs gas, which is traded with the Ether cryptocurrency.

As a user pays gas upfront, a transaction’s computation may exceed its allotted amount of gas. In that case, the Ethereum Virtual Machine (EVM), which is the runtime environment for compiled smart contracts, raises an out-of-gas exception and aborts the transaction. A contract is at risk for a gas-focused vulnerability if it has not anticipated (or otherwise does not correctly handle) the possible abortion of a transaction due to out-of-gas conditions.

A vulnerable smart contract may be blocked forever due to the incorrect handling of out-of-gas conditions: re-executing the contract’s function will fail to make progress, re-yielding out-of-gas exceptions, indefinitely. Thus, although an attacker cannot directly appropriate funds, they can cause damage to the contract, locking its balance away in what is, effectively, a denial-of-service attack.

Such attacks may benefit an attacker in indirect ways—for example, harming competitors or the ecosystem, amassing fame in a black-hat community, or blackmailing. In this work, we present MadMax:1 a static program analysis framework for detecting gas- ocused vulnerabilities in smart contracts. MadMax is a static analysis pipeline consisting of a decompiler (from low-level EVM bytecode to a structured intermediate language) and a logic-based analysis specification. MadMax is highly efficient and effective: it analyzes the whole Ethereum blockchain in just 10 hours and reports numerous vulnerable contracts holding a total value exceeding $2.8B, with high precision, as determined from a random sample.
MadMax is unique in the landscape of smart contract analyzers and verifiers. It is an approach employing cutting-edge declarative static analysis techniques (e.g., context- ensitive flow analysis and memory layout modeling for data structures), whereas past analyzers have primarily focused on lightweight static analysis, on symbolic execution, or on fullfledged verification for functional correctness.

As MadMax demonstrates, static program analysis offers a unique combination of advantages: very high scalability (applying to the entire blockchain) and high coverage of potential vulnerabilities. Additionally, MadMax is raising the level of abstraction of automated security analysis, by encoding complex properties (such as “safely resumable loop” or “storage whose increase is caused by public calls”), which, in turn, allow detecting vulnerabilities that span multiple transactions.
2. BACKGROUND
A blockchain is a shared, transparent distributed ledger of transactions that is secured using cryptography. One can think of a blockchain as a long and ever-growing list of blocks, each encoding a sequence of individual transactions, always available for inspection and safe from tampering. Each block contains a cryptographic signature of its previous block. Thus, no previous block can be changed or rejected without also rejecting all its successors. Peers/ miners run a mining client for separately maintaining the current version of the blockchain. Each of the peers considers the longest valid chain starting from a genesis block to be the accepted version of the blockchain.

To encourage transaction validation by all peers and discourage wasted or misleading work, a blockchain protocol typically combines two factors: an incentive that is given as a reward to peers successfully performing validation, and a proof-of-work, requiring costly computation to produce a block. To see how distributed consensus and permanent record-keeping arise, consider a malicious client who tries to double-spend a certain amount. The client may propagate conflicting transactions (e.g., paying sellers A and B) to different parts of the network. As different peers become aware of the two versions of the truth, a majority will arise, because the peers will build further blocks over the version they perceived as current. Thus, a majority will soon accept one of the two spending transactions as authoritative and will reject the other.
Access the full document
October 4, 2020
Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities

Lexi Brent∗
Int’l Computer Science Institute Berkeley, CA, USA
lexi@icsi.berkeley.edu

Neville Grech
University of Athens Athens, Greece
me@nevillegrech.com

Sifis Lagouvardos
University of Athens Athens, Greece
sifis.lag@di.uoa.gr

Bernhard Scholz
University of Sydney Sydney, NSW, Australia
bernhard.scholz@sydney.edu.au

Yannis Smaragdakis
University of Athens Athens, Greece
yannis@smaragd.org

Abstract
Smart contracts on permission less blockchains are exposed to inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage. We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to- nd vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

CCS Concepts: · Software and its engineering → Formal software verification; · Theory of computation →Program analysis.
Keywords: static analysis, information flow, smart contracts

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. PLDI ’20, June 15ś20, 2020, London, UK © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7613-6/20/06. . . $15.

ACM Reference Format:
Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. 2020. Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’20), June 15ś20, 2020, London, UK. ACM, New York

Introduction
Permissionless blockchain platforms such as Bitcoin [30] and Ethereum [6, 42] promise to revolutionize all sectors of multiparty interaction, by enabling decentralized, resilient consensus. The Ethereum platform, in particular, allows the execution of arbitrary programs, which are called smart contracts. Smart contracts are registered immutably on the blockchain and operate autonomously. Software correctness in smart contracts is critical, as (1) the contracts are high-value targets (since they manage monetary assets), (2) are immutable and cannot be patched, and (3) are fully available for inspection (and invocation) by potential attackers. The need for contract correctness has gained prominence via several recent high profile incidents resulting in losses of cryptocurrency amounts valued in the hundreds of millions [9, 41]. Smart contracts are Turing-complete programs, typically expressed in a domain-specific language called Solidity [40]. Solidity contracts are compiled to low-level bytecode and executed by the Ethereum Virtual Machine (EVM) on the blockchain. Smart contracts store data either on the blockchain or in execution-ephemeral memory. Solidity programs implement unconventional data structures using cryptographic hashing, posing a challenge for static bytecode analyzers. Furthermore, EVM bytecode does not explicitly expose control flow, necessitating sophisticated decomplication methodologies [5, 13, 24, 44]. Hence, checking for even basic security vulnerabilities is non-trivial, as security analyzers for smart contracts require complex techniques to overcome these challenges [14, 37].

With the rising awareness of Ethereum security risks and the development of recommended practices, realistic attacks have become increasingly complex. They often chain successive exploits that each expose more vulnerabilities. A new class of security analyzers is required to identify such composite attacks. The most notorious (albeit relatively simple) example of a composite attack is the Parity wallet hack, where the equivalent of $280M [9] was stolen or frozen via two separate vulnerabilities. The attack involved reinitializing part of the contract that sets owner variables, via a vulnerable library function, prior to attacking the contract.

To safeguard against sophisticated attacks, it is essential to model the information flow [34] inside smart contracts accurately. Information flow classifies information into trusted and untrusted and considers how it propagates in the code. For example, if untrusted (łtaintedž) information from external sources is allowed to spread to critical program points, it can alter key aspects of program behavior.

Smart contract programmers employ guarding patterns to prevent an untrusted entity from executing sensitive operations, such as destroying the contract. For example, the contract code could check whether the user of the contract (a.k.a., its caller or sender) has specific privileges. The simplest check is to permit only a particular contract owner account to perform critical actions. For security analyzers, it is essential to model this guarding mechanism, for any highfidelity information flow analysis: if guards are not taken into consideration, many contracts will be incorrectly inferred to be vulnerable, when the only user that can łexploitž these vulnerabilities would be the owner of the contract themselves. Conversely, the guarding code can itself be attacked, by invoking code that manipulates the state of the contract in unexpected ways.

In this work, we present Ethainter: the first security analyzer for detecting composite information flow violations in Ethereum smart contracts. Ethainter enhances the understanding of tainted information flow with the tainting of guard conditions, other potential ineffectiveness of guards, as well as different kinds of taint that are not prevented by guarding. Ethainter models not just the existence of guards but also whether they are effective in sanitizing information. The specification of the analysis is in a formalism of independent value. It ignores orthogonal, well-understood technical complexities, capturing, instead, the essence of new information flow concepts (e.g., tainted guards, taint through storage) in minimal form. For example, we consider a tainted memory store to induce further information flow more eagerly than an untainted (yet still statically undetermined) memory store. The precision, recall, and scalability of Ethainter are highly fine-tuned using mutually-recursive Datalog rules for tainting and overall information flow. The contribution of our work is as follows:

Access the full document

June 15, 2020

Project Name	Website	Estimated Impact
zkBridge	https://zkbridge.com	Moderate
LayerZero	https://layerzero.network/	Moderate
Telepathy	https://docs.telepathy.xyz/	Moderate

Category: Research

EIP-1153 | Quick refresher of data storage and access

Storage

Memory

Stack

Calldata

Transient Storage

EIP-1153 impact analysis

Gas efficiency improvements

Conclusion

Introduction

Dedaub’s EIP-3074 Impact Study: A Look Back

About EIP-3074 and the Pectra Update

Applications Enabled by EIP-3074

EIP-4877 Summary | Main body of the get() function