Author: NEVILLE GRECH

  • Rari Capital Vulnerability

    Security researchers actively participating in Tribe DAO’s Discord security channel, raised concerns about a security issue relating to Fuse pools. The Rari Capital team executed our pre-established emergency response plan and immediately fixed the vulnerability. Because of the identification of the vulnerability, and the quick actions in response, no funds were lost. This article will address the nature and identification of the vulnerability as well as the remediation steps executed by the Rari Capital team.

    While security has always been a priority in Rari Capital’s projects, Rari Capital’s capabilities have come a long way since its inception. Rari started as a fair launch project with next to no resources and a team of scrappy yet talented engineers. Today, as a highly experienced group of contributors and a part of the Tribe DAO, the Rari Capital team has exceptional resources enabling it to implement the most extensive security measures. Rari now has contributors who are some of DeFi’s top smart contract engineers, a robust network of security auditing professionals and organizations, and a thriving relationship with the white hat community. However, some of the initial Fuse contracts were written prior to having access to these resources and were not scrutinized like contracts are today.

    The Vulnerability

    At around 4:30 PM PT on March 3rd security researchers including @samczsun, @hritzdorf, and @YSmaragdakis (of @Dedaub) identified a vulnerability across multiple Fuse pools. Pools 0 to 32 with the exception of Pool 6 were at risk. The Rari Capital team was informed, and the admin multisig immediately paused borrowing across all Fuse pools.

    The vulnerability was on an old version of the cToken and comptroller implementations. Specifically, the vulnerability was on the cEther contract which used .call.value to transfer ETH instead of .transfer like Compound. This is one of a handful of audited changes from the Compound codebase. Unfortunately audits aren’t a silver bullet and this vulnerability slipped through the cracks. It allowed for a cross-asset reentrancy upon cEther redemption where all assets in vulnerable pools could be borrowed for free. This was because the cEther state hadn’t fully been updated with the effects of the redemption before the ETH transfer. As a result all borrowable assets could have been stolen from those pools.

    In late 2021, the Fuse contracts were upgraded to a version which contained a pool-wide check for reentrancy. All pools deployed after pool 32 used the upgraded contracts. Pool admins who deployed prior to the update had an option to upgrade to the latest Fuse contracts from the UI. Of the 32 pools that were deployed with this version of Fuse, only pool 6 was upgraded by its admin.

    The Rari Capital Response

    Upon being contacted by the researchers about the possible vulnerability in the platform, the Rari Capital team initiated processes to identify, confirm, and validate the issue. Per the incident response playbook, it was determined from a set of available actions to immediately pause borrowing globally. An extensive review of the vulnerability took place, which included a PoC, and review of available remediations options. Once the solution was developed, validated, and tested, the team acted fast to expedite the implementation of the fix.

    Each Fuse at risk pool was upgraded to the latest cToken and Comptroller implementations, which prevents this or similar reentrancy vulnerabilities from being exploited. All pools were re-tested to confirm the vulnerability was remediated and borrowing was re-enabled the next morning. All of this occurred within 16 hours of identifying the vulnerability.

    Future Security Measures

    In the past the Rari Team has ensured that all code in production is scrutinized and goes through an extensive auditing process. In response to the identified vulnerability Rari will be taking a series of enhanced security measures. First, Rari Capital engineers are currently conducting extensive internal reviews of the Fuse codebase. Rari Capital and Fei Protocol have also merged their respective Immunefi bug bounties into one joint Tribe DAO bug bounty.

    Rari greatly appreciates the relationship and collaboration with the white hat community and with many of DeFi’s top security engineers. Thank you to all who assisted in identifying and nullifying this vulnerability and to the community who continues to contribute and support as we move forward stronger than ever.

  • Elipmoc: Advanced Decompilation of Ethereum SmartContracts

    Elipmoc: Advanced Decompilation of Ethereum SmartContracts


    NEVILLE GRECH, University of Malta, Malta and Dedaub Ltd
    SIFIS LAGOUVARDOS, University of Athens, Greece and Dedaub Ltd
    ILIAS TSATIRIS, University of Athens, Greece and Dedaub Ltd
    YANNIS SMARAGDAKIS, University of Athens, Greece and Dedaub Ltd

    Smart contracts on the Ethereum blockchain greatly benefit from cutting-edge analysis techniques and pose significant challenges. A primary challenge is the extremely low-level representation of deployed contracts. We present Elipmoc, a decompiler for the next generation of smart contract analyses. Elipmoc is an evolution of Gigahorse, the top research decompiler, dramatically improving over it and over other state-of-the-art tools, by employing several high-precision techniques and making them scalable. Among these techniques are a new kind of context sensitivity (termed “transactional sensitivity”) that provides a more effective static abstraction of distinct dynamic executions; a path-sensitive (yet scalable, through path merging) algorithm for inference of function arguments and returns; and a fully context sensitive private function reconstruction process. As a result, smart contract security analyses and reverse-engineering tools built on top of Elipmoc achieve high scalability, precision and completeness. Elipmoc improves over all notable past decompilers, including its predecessor, Gigahorse, and the state-of-the-art industrial tool, Panoramix, integrated into the primary Ethereum blockchain explorer, Etherscan. Elipmoc produces decompiled contracts with fully resolved operands at a rate of 99.5% (compared to 62.8% for Gigahorse), and achieves much higher completeness in code decompilation than Panoramix—e.g., up to 67% more coverage of external call statements—while being over 5x faster. Elipmoc has been the enabler for recent (independent) discoveries of several exploitable vulnerabilities on popular protocols, over funds in the many millions of dollars.

    Additional Key Words and Phrases: Program Analysis, Smart Contracts, Decompilation, Datalog, Security, Ethereum, Blockchain

    1 INTRODUCTION
    Decentralized financial applications, built using smart contracts running on programmable blockchains (most notably Ethereum), are starting to rival traditional financial systems. Therefore coding or logical errors in smart contracts can have large financial implications. This makes smart contracts primary targets for automated analysis and verification tasks.
    More specifically, the analysis of smart contracts as-deployed, i.e., by taking their binary form as input, is attractive for several reasons. First, it offers significant generality: many deployed smart contracts have no publicly released source code. Security analysts routinely need to inspect such contracts (e.g., bots) to determine their functionality. Second, analyzing contracts at the bytecode level offers a uniform platform for analysis, not distinguishing between source languages and source language versions. This is a major factor, given that (the dominant Ethereum language)

    Authors’ addresses: Neville Grech, University of Malta, Malta and Dedaub Ltd, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece and Dedaub Ltd, sifis.lag@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece and Dedaub Ltd, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, University of Athens, Greece and Dedaub Ltd, smaragd@di.uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
    the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 2475-1421/2022/4-ART77 https://doi.org/10.1145/3527321 Proc. ACM Program. Lang., Vol. 6, No. OOPSLA1, Article 77. Publication date: April 2022. 77:2 Neville Grech, Sifis Lagouvardos, Ilias Tsatiris, and Yannis Smaragdakis Solidity does not maintain source compatibility across versions. Source-level tools very quickly become obsolete.

    Bytecode decompilation offers by far the most general substrate for automated smart contract analyses. Decompilation of the Ethereum Virtual Machine (EVM) bytecode is an open problem for the security community. Existing approaches lack in completeness, precision, or scalability. For in-stance, the most-used decompiler in practice, etherscan.io’s Panoramix [Kolinko and Palkeo 2020], decompiles complex contracts only partially, with much of the low-level code often not reflected in the decompiled version. Past academic research tools, such as the Gigahorse decompiler [Grech et al. 2019a], sacrifice precision and occasionally even scalabily. Lower precision results in low-quality decompilation, for instance, operations with unknown operands, or jumps with many infeasible targets.

    To illustrate such shortcomings, our experiments show that these leading past decompilers manage to decompile under 20% (Panoramix: 18.2%, Gigahorse: 19.8%) of Ethereum contracts of large size (over 20KB). Technically, the low-level nature of the EVM poses a challenge for any decompilation effort. The EVM uses a single stack for calls and local operations, without any guarantees about its well-formedness. The only control flow operators are unstructured jumps. All calls, returns, loops, and conditionals are implemented as jumps to whichever address is currently at the top of the stack. Compilers producing EVM bytecode often perform aggressive optimization, since instructions have a monetary cost to execute. A major obfuscating factor, for instance, is the aggressive merging of identical instruction sequences belonging to different internal, a.k.a. private, functions.

    Control and data flow are, as a result, very hard to discern without sophisticated modeling techniques. In this paper, we introduce Elipmoc (“compile”, backwards), a decompiler for Ethereum smart con-tracts. Elipmoc advances the state of the art in EVM bytecode decompilation, offering significantly increased precision and completeness. Elipmoc is the substrate of a successful analysis framework that has flagged numerous exploitable vulnerabilities on contracts with or without source (but with millions of dollars in locked value). All analyses operate over the three-address code exported by Elipmoc. We have made several vulnerability disclosures, some of which resulted in major rescue efforts [Dedaub 2021b,c,d,f; Michales, Jonah 2021; Primitive Finance 2021].

    In addition, the Elipmoc team has been commissioned to perform three separate studies for the Ethereum Foundation, for the assessment of the impact of Ethereum Improvement Proposals EIP-1884, EIP-3074, and a future EIP that proposes a rearchitecting of storage gas cost metering. The Elipmoc decompiler allows answering questions such as “what will be the impact of change-X on the entire set of currently deployed contracts?” and is becoming a crucial tool for the evolution of the Ethereum blockchain and many of its applications.
    In technical terms,

    Elipmoc makes the following contributions:

    ● Transactional Context Sensitivity. Elipmoc proposes an effective form of context sensitivity for smart contract execution. Context sensitivity offers a way to condense actual executions into short static signatures. Devising good signatures is crucial for both precision and scalability throughout the decompiler. Elipmoc’s transactional context sensitivity is based on insights about EVM low-level control flow (esp. continuation-based optimizations) and overall execution.

    ● Scalable, context & path-sensitive function reconstruction. Since compilers tend to reuse EVM bytecode sequences for the benefit of multiple functions, a direct inference of private functions is not straightforward. Additionally, high-level control flow is compiled away to the point of near-irreversibility, using continuation-passing-style patterns. We address such issues with a static analysis maintaining the current contents of a continuation stack, and a path-sensitive analysis to determine with high precision the maximum number of stack elements consumed, i.e.,

    Access the full document

  • Symbolic Value-flow Static Analysis of Ethereum Smart Contracts

    Symbolic Value-flow Static Analysis of Ethereum Smart Contracts

    We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis.

    The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision.

    We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value.

    In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures.

    Read more

  • Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts

    Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts


    YANNIS SMARAGDAKIS, University of Athens, Greece
    NEVILLE GRECH, University of Malta, Malta
    SIFIS LAGOUVARDOS, University of Athens, Greece
    KONSTANTINOS TRIANTAFYLLOU, University of Athens, Greece
    ILIAS TSATIRIS, University of Athens, Greece

    We present a static analysis approach that combines concrete values and symbolic expressions. This symbolic value-flow (“symvalic”) analysis models program behavior with high precision, e.g., full path sensitivity. To achieve deep modeling of program semantics, the analysis relies on a symbiotic relationship between a traditional static analysis fixpoint computation and a symbolic solver: the solver does not merely receive a complex “path condition” to solve, but is instead invoked repeatedly (often tens or hundreds of thousands of times), in close cooperation with the flow computation of the analysis.

    The result of the symvalic analysis architecture is a static modeling of program behavior that is much more complete than symbolic execution, much more precise than conventional static analysis, and domain-agnostic: no special-purpose definition of anti-patterns is necessary in order to compute violations of safety conditions with high precision. We apply the analysis to the domain of Ethereum smart contracts. This domain represents a fundamental challenge for program analysis approaches: despite numerous publications, research work has not been effective at uncovering vulnerabilities of high real-world value. In systematic comparison of symvalic analysis with past tools, we find significantly increased completeness (shown as 83-96% statement coverage and more true error reports) combined with much higher precision, as measured by rate of true positive reports. In terms of real-world impact, since the beginning of 2021, the analysis has resulted in the discovery and disclosure of several critical vulnerabilities, over funds in the many millions of dollars. Six separate bug bounties totaling over $350K have been awarded for these disclosures. CCS Concepts:

    • Theory of computation → Program analysis;
    • Software and its engineering → General programming languages;
    • Security and privacy → Software and application security.

    Additional Key Words and Phrases: Program Analysis, Smart Contracts, Security, Ethereum, Blockchain ACM Reference Format: Yannis Smaragdakis, Neville Grech, Sifis Lagouvardos, Konstantinos Triantafyllou, and Ilias Tsatiris. 2021.


    Symbolic Value-Flow Static Analysis: Deep, Precise, Complete Modeling of Ethereum Smart Contracts. Proc. ACM Program. Lang. 5, OOPSLA, Article 163 (October 2021), 30 pages. https://doi.org/10.1145/3485540 Authors’ addresses: Yannis Smaragdakis, University of Athens, Greece, smaragd@di.uoa.gr; Neville Grech, University of Malta, Malta, me@nevillegrech.com; Sifis Lagouvardos, University of Athens, Greece, sifis.lag@di.uoa.gr; Konstantinos Triantafyllou, University of Athens, Greece, kotriant@di.uoa.gr; Ilias Tsatiris, University of Athens, Greece, i.tsatiris@di. uoa.gr. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    1 INTRODUCTION
    Static analysis is distinguished among program analysis techniques (e.g., testing [Czech et al. 2016; Meyer 2008], model checking [Clarke et al. 1986; Jhala and Majumdar 2009], or symbolic execution [Baldoni et al. 2018; King 1976]) by its emphasis on completeness, i.e., its attempt to model all (or as many as possible) program behaviors. To achieve this goal under a realistic time budget, static analysis often has to sacrifice some precision: the analysis may consider combinations of values that may never appear in a real execution. Maintaining high precision, while achieving completeness and scalability, remains a formidable challenge. In this work, we present an analysis that attempts to meet this challenge, under realistic domain assumptions. The analysis maintains high precision (closely analogous to that of model checking or full program execution) while achieving very high completeness, as measured in terms of coverage of program behaviors.

    For conciseness purposes, we give to the analysis architecture the name symvalic analysis, for “symbolic+value-flow static analysis”. A symvalic analysis computes for each program variable a set of possible concrete values as well as symbolic expressions. The analysis is path-sensitive: values propagate to a variable at a program point only if they satisfy (modulo natural analysis over-approximation) the conditions under which the program point would be reachable. In order to satisfy conditions, the analysis needs to solve equalities and inequalities over both concrete and symbolic values. To do this, the analysis appeals to a symbolic solver and simplifier tightly integrated with the analysis core. At the same time, most of the analysis coverage of the program is done inexpensively, based on concrete values—e.g., the analysis tries selected small integers, large integers, and constants from the program text as values of variables at every external entry point.

    The above description is perhaps reminiscent of dynamic-symbolic (a.k.a. concolic) execution [Cadar et al. 2008; Godefroid 2007; Godefroid et al. 2005; Sen and Agha 2006; Sen et al. 2005; Tillmann and de Halleux 2008; Tillmann and Schulte 2006]. Some of the insights are indeed similar. The use of concrete values whenever possible (and, conversely, the appeal to symbolic solving selectively) is responsible for the symvalic analysis scalability, much as it has been for dynamic-symbolic execution. However, symvalic analysis also has significant differences from dynamic-symbolic execution. Most notably, it is a static analysis and not full program execution. To model a program statement, symvalic analysis does not need to create a full context of values for every other program variable and (heap) storage location. Indeed, symvalic analysis treats concrete values highly efficiently, manipulating them using set-at-a- ime reasoning, instead of individually. This is the value-flow aspect of symvalic analysis: sets of values are propagated using mass operations (table joins) using a standard fixpoint (Datalog) analysis engine. We apply the idea of symvalic analysis to Ethereum smart contracts [Wood 2014]: a domain where high-precision, high-completeness analyses are in demand. Smart contracts are programs that handle monetary assets whose value occasionally rises to the many millions of dollars.

    Tens of security researchers pore over the code of these contracts daily. Discovering an important bug is a source of both fame and reward. Conventional analysis techniques are rarely effective for high-value contracts: Perez and Livshits [2021] recently find that only 0.27% of the funds reported vulnerable by some of the most prominent research tools have truly been subsequently exploited. Simply put, program analysis can help with vulnerabilities in contracts that have not received much scrutiny, but rarely helps with high-value vulnerabilities that have not been found through other means. Partly addressing this need, symvalic analysis has already enjoyed significant success in the analysis of smart contracts, with several high-value vulnerabilities discovered. In brief, our work claims the following contributions:

    Access the full document

  • Verkle Tree Gas Metering Impact

    Verkle Tree Gas Metering Impact

    Dedaub was commissioned by the Ethereum Foundation to investigate the impact of Vitalik Buterin’s Verkle tree gas metering proposal on existing smart contracts.

    The impact is not negligible, with a 26% gas cost increase on average. However, this will be gauged against the impact to the security and scalability of the consensus of the network, and Verkle trees could be a game-changer in this regard.

    INTRODUCTION

    Dedaub was commissioned by the Ethereum Foundation to investigate the impact of Vitalik Buterin’s Verkle tree gas metering proposal on existing smart contracts. In order to appraise the impact of the proposed change, we performed extensive simulations of the proposed changes over past transactions; wrote a static analysis and applied it over most contracts deployed to the mainnet; examined bytecode, source, decompiled code, and low level traces of past transactions.

    Vitalik Buterin’s proposal introduces new gas changes to state access operations that closely reflect the worst case witness size that needs to be downloaded by a stateless client that needs to validate a block. As described in the original proposal, a Verkle tree is a cryptographically secure data structure with a high branching factor. Unlike Merkle trees, trace witnesses in Verkle trees do not need to include the siblings of each node in a path to the root node, which means that trees with a high branching factor are more efficient. Verkle trees can still retain the security properties by making use of a polynomial commitment scheme. By leveraging these properties, gas costs can therefore more easily reflect the upper bound of the cost of transmitting these witnesses across stateless clients, assuming these are appropriately designed.

    In addition to gas changes that the proposal introduces, the Verkle tree proposal may break protocols that are too tied to the existing implementation, such as protocols that use Keydonix Oracles like Rune, but this consideration is outside the scope of this report.

    In the report we briefly summarize the proposal with respect to gas costs. We then delve into the impact to existing contracts using two techniques: path-sensitive static program analysis, and dynamic analysis by modifying an Erigon node with the new gas semantics and analyzing the data per internal transaction. Finally we state our recommendations for lessening the impact of this proposal.

    BACKGROUND

    The goal of the proposal [1] aims to make Ethereum _stateless clients _sustainable in terms of data transmission requirements.

    Stateless clients should be able to verify the correctness of any individual block without any extra information except for the block’s header and a small file, called witness, that contains the portion of the state accessed by the block along with proofs of correctness. Witnesses can be produced by any state-holding node in the network. The benefits of stateless verification include allowing clients to run in low-disk-space environments, enabling semi-light-client setups where clients trust blocks by default but stand ready to verify any specific block in the case of an alarm, and secure sharding setups where clients jump between shards frequently without the need to keep up with the state of all shards.

    Large witness sizes are the key problem for enabling stateless clients for Ethereum and the adoption of Verkle trees can reduce the witness sizes needed. More specifically, a witness accessing an account in the hexary Patricia tree is, in the average case, close to 3 kB, and in the worst case it may be three times larger. Assuming a worst case of 6000 accesses per block (15m gas / 2500 gas per access), this corresponds to a witness size of ~18 MB, which is too large to safely broadcast through a p2p network within a 12-second slot. Verkle trees reduce witness sizes to ~200 bytes per account in the average case, allowing stateless client witnesses to be acceptably small.

    Finally, contract code needs to be included in a witness. A contract’s code can contain up to 24000 bytes, and so a 2600 gas CALL can add ~24200 bytes to the witness size. This implies a worst-case witness size of over 100 MB. The proposal suggests breaking up contract code into chunks that can be proven separately; this can be done simultaneously with a move to a Verkle tree. Since a contract’s code can contain up to 24000 bytes, and so a 2600 gas CALL can add ~24200 bytes to the witness size. This implies a worst-case witness size of over 100 MB. The solution is to move away from storing code as a single monolithic hash, and instead break it up into chunks that can be proven separately and adding gas rules1 that accounts for these costs2.

    The small witness sizes described above are possible because Verkle trees rely on an efficient cryptographic commitment scheme, called Polynomial Commitments3 4. Polynomial Commitments allow for logarithmic-sized (in respect to the tree’s height) proof-of-inclusion of any number of leaves in a subtree – and this is independent to the branching factor of the tree. In comparison, traditional Merkle tree’s proofs depend on the branching factor of the tree in a linear fashion, since all siblings of a node-to-be-proved must be included in the proof.

    SUMMARY OF GAS CHANGES

    The current proposal suggests new gas costs which have no counterpart in the previous versions of Ethereum. These is a gas cost for every chunk of 31 bytes of bytecode which are accessed and some additional access events that apply to the entire transaction. An improvement in gas cost can however be also achieved by lowering the cost for SLOAD/SSTORE for when fewer subtrees of the tree structure underpinning the state are accessed. The following is a summary of the gas cost changes between EIP-2929, which introduces the notion of “access lists” and the current proposal.

    SLOAD, but location previously accessed within the same tx using SLOAD or SSTORE

    VerkleWARM_STORAGE_READ_COST (100)
    EIP-2929WARM_STORAGE_READ_COST (100)

    SLOAD, but location previously unread within the tx using SLOAD or SSTORE

    Verkle200 if previously visited subtree 2100 if subtree has not been visited (storage location is up to 64 / 256 elements away from the closest visited)
    EIP-2929COLD_SLOAD_COST = 2100

    CALL an address for an account that has not been previously accessed within the tx

    VerkleAccess list costs (typically 1900 + 200 + 200 + more if value bearing)
    EIP-2929COLD_ACCOUNT_ACCESS_COST = 2600

    EIP-2929 mentions that precompiles are not charged COLD_ACCOUNT_ACCESS_COST but the Verkle specification omits this case at the time of writing.


    Read more

  • Precise Static Modeling of Ethereum “Memory”


    SIFIS LAGOUVARDOS, University of Athens, Greece
    NEVILLE GRECH, University of Athens, Greece
    ILIAS TSATIRIS, University of Athens, Greece
    YANNIS SMARAGDAKIS, University of Athens, Greece

    Static analysis of smart contracts as-deployed on the Ethereum blockchain has received much recent attention. However, high-precision analyses currently face significant challenges when dealing with the Ethereum VM (EVM) execution model. A major such challenge is the modeling of low-level, transient “memory” (as opposed to persistent, on-blockchain “storage”) that smart contracts employ. Statically understanding the usage patterns of memory is non-trivial, due to the dynamic allocation nature of in-memory buffers. We offer an analysis that models EVM memory, recovering high-level concepts (e.g., arrays, buffers, call arguments) via deep modeling of the flow of values. Our analysis opens the door to Ethereum static analyses with drastically increased precision. One such analysis detects the extraction of ERC20 tokens by unauthorized users. For another practical vulnerability (redundant calls, possibly used as an attack vector), our memory modeling yields analysis precision of 89%, compared to 16% for a state-of-the-art tool without precise memory modeling. Additionally, precise memory modeling enables the static computation of a contract’s gas cost.

    This gas-cost analysis has recently been instrumental in the evaluation of the impact of the EIP-1884 repricing (in terms of gas costs) of EVM operations, leading to a reward and significant publicity from the Ethereum Foundation. CCS Concepts: • Software and its engineering → Formal software verification; • Theory of computation → Program analysis. Additional Key Words and Phrases: ethereum, EVM, static analysis ACM Reference Format:
    Sifis Lagouvardos, Neville Grech, Ilias Tsatiris, and Yannis Smaragdakis. 2020. Precise Static Modeling of Ethereum “Memory”. Proc. ACM Program. Lang. 4, OOPSLA, Article 190 (November 2020), 27 pages. https: //doi.org/10.1145/3428258

    1 INTRODUCTION
    The Ethereum blockchain has enabled the management of digital assets via unsupervised autonomous agents called smart contracts. Smart contracts are among the computer programs with the most dire needs of high correctness assurance, due to their managing of high-value assets, as well as their public and immutable nature. Therefore, static analysis for Ethereum smart contracts has captured the attention of the research community in recent years [Albert et al. 2018; Feist et al. 2019; Grech et al. 2018; Mueller 2018; Tsankov et al. 2018].

    The first generation of Ethereum Authors’ addresses: Sifis Lagouvardos, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, sifis.lag@di.uoa.gr; Neville Grech, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, me@nevillegrech.com; Ilias Tsatiris, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, yannis@smaragd.org. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2020 Copyright held by the owner/author(s). 2475- 421/2020/11-ART190 https://doi.org/10.1145/3428258

    Static analyses have already exhibited significant successes.1 For instance, Securify [Tsankov et al. 2018] has been the basis for a leading company in the Ethereum security space, with many tens of commercial audits performed. Most static analysis tools for Ethereum operate at the binary level of contracts, as-deployed on the blockchain. This ensures that the analysis operates on all contracts in existence, regardless of whether source code is available. (Source code is not always present but not negligible either: it is available for under 25% of deployed contracts, yet for more than half of the high-value contracts.) Furthermore, operating on low-level binaries offers source-language and languageversion independence, completeness in the presence of inline assembly (which is common), and, perhaps most importantly, uniform treatment of complex language features: as analysis authors often argue, innocuous-looking source-level expressions can incur looping behavior or implicit overflow [Grech et al. 2018].


    Operating at the binary level necessitates contract decompilation [eth [n. d.]; Brent et al. 2018; Grech et al. 2019; Kolinko 2018] in order to recover high-level information from the very-low-level Ethereum VM (EVM) bytecode format. This decompilation effort is non-trivial: the EVM is a stack machine with no structured information (no types or functions). Control- low (i.e., calls, returns, conditionals) is implemented as forward jumps over run-time values obtained from the stack, hence even producing a control-flow graph requires a deep static analysis [Grech et al. 2018]. Despite these challenges, sophisticated decompilers mostly succeed in recovering high-level control flow and have been the basis of the most successful static analyses. Despite the relative success of Ethereum decompilers, some low-level aspects of the deployed contract remain unaddressed. The most major such aspect is the precise modeling of transient EVM memory. “Memory” in the context of the EVM (and of this paper) refers to a data store that is transient and transaction-private, used by the runtime primarily to store values whose size is statically unknown—e.g., arrays, strings, or encoded buffers. (Memory is to be contrasted with storage: the on-blockchain persistent value store of an Ethereum smart contract.)


    Memory is crucial in the execution of smart contracts. It is used for many cryptographic hash operations (SHA3 is a native instruction, operating over arbitrary-length buffers), for the arguments of external calls (i.e., calls to other contracts), for logging operations, and for returns from a public function. Crucially, any complex mapping data structure (i.e., the most common Ethereum data structures) stores and retrieves information (from storage) after hashing keys in memory buffers. Modeling EVM memory is unlike other memory management settings. There is no speed premium for locality, yet the compiler attempts to keep memory tightly packed because the contract’s execution cost (in terms of Ethereum gas consumed for memory-related operations) depends on the highest initialized memory address. For the vast majority of smart contracts, written in the Solidity language, the compiler uses a simple strategy: every allocated buffer is written after the end of the last one, updating a “free-memory” pointer. This memory management policy is a low-level aspect, introduced entirely during the compilation process. At the source-code level, memory is implicit: the values that end up stored in memory merely have dynamic-length array or string types. The goal of a precise modeling of EVM memory is to recover such source-level information (expressed in terms of arrays or strings) from the low-level address manipulation code that the compiler produces. Generally, our work makes the following contributions:

    Access the full document

  • MadMax: Analyzing the Out-of-Gas World of Smart Contracts

    MadMax: Analyzing the Out-of-Gas World of Smart Contracts


    Abstract
    Ethereum is a distributed blockchain platform, serving as an ecosystem for smart contracts: full-fledged intercommunicating programs that capture the transaction logic of an account. A gas limit caps the execution of an Ethereum smart contract: instructions, when executed, consume gas, and the execution proceeds as long as gas is available. Gas-focused vulnerabilities permit an attacker to force key contract functionality to run out of gas—effectively performing a permanent denial-of-service attack on the contract. Such vulnerabilities are among the hardest for programmers to protect against, as out-of-gas behavior may be uncommon in nonattack scenarios and reasoning about these vulnerabilities is nontrivial.

    In this paper, we identify gas-focused vulnerabilities and present MadMax: a static program analysis technique that automatically detects gas-focused vulnerabilities with very high confidence. MadMax combines a smart contract decompiler and semantic queries in Datalog. Our approach captures high-level program modeling concepts (such as “dynamic data structure storage” and “safely resumable loops”) and delivers high precision and scalability. MadMax analyzes the entirety of smart contracts in the Ethereum blockchain in just 10 hours and flags vulnerabilities in contracts with a monetary value in billions of dollars. Manual inspection of a sample of flagged contracts shows that 81% of the sampled warnings do indeed lead to vulnerabilities.

    1. INTRODUCTION
    Ethereum is a decentralized blockchain platform that can execute arbitrarily-expressive computational smart contracts. A smart contract can capture virtually any complex interaction, such as responding to communication from other accounts and dispensing or accepting funds. The possibilities for such programmable logic are endless. It may encode a payoff schedule, investment assumptions, interest policy, conditional trading directives, trade or payment agreements, and complex pricing.

    Virtually any transactional multiparty interaction is expressible without a need for intermediaries or third-party trust. Smart contracts typically handle transactions in Ether, which is the native cryptocurrency of the Ethereum blockchain with a current market capitalization in tens of billions of dollars. Smart contracts (as opposed to noncomputational “wallets”) hold a considerable portion of the total Ether available in circulation, which makes them ripe targets for attackers. Hence, developers and auditors have a strong incentive to make extensive use of various tools and programming techniques that minimize the risk of their contract being attacked.

    Analysis and verification of smart contracts are, therefore, high-value tasks, possibly more so than in any other application domain. The combination of monetary value and public availability makes the early detection of vulnerabilities a task of paramount importance. A broad family of contract vulnerabilities concerns out-ofgas behavior. Gas is the fuel of computation in Ethereum. Due to the massively replicated execution platform, wasting the resources of others is prevented by charging users for running a contract. Each executed instruction costs gas, which is traded with the Ether cryptocurrency.

    As a user pays gas upfront, a transaction’s computation may exceed its allotted amount of gas. In that case, the Ethereum Virtual Machine (EVM), which is the runtime environment for compiled smart contracts, raises an out-of-gas exception and aborts the transaction. A contract is at risk for a gas-focused vulnerability if it has not anticipated (or otherwise does not correctly handle) the possible abortion of a transaction due to out-of-gas conditions.

    A vulnerable smart contract may be blocked forever due to the incorrect handling of out-of-gas conditions: re-executing the contract’s function will fail to make progress, re-yielding out-of-gas exceptions, indefinitely. Thus, although an attacker cannot directly appropriate funds, they can cause damage to the contract, locking its balance away in what is, effectively, a denial-of-service attack.

    Such attacks may benefit an attacker in indirect ways—for example, harming competitors or the ecosystem, amassing fame in a black-hat community, or blackmailing. In this work, we present MadMax:1 a static program analysis framework for detecting gas- ocused vulnerabilities in smart contracts. MadMax is a static analysis pipeline consisting of a decompiler (from low-level EVM bytecode to a structured intermediate language) and a logic-based analysis specification. MadMax is highly efficient and effective: it analyzes the whole Ethereum blockchain in just 10 hours and reports numerous vulnerable contracts holding a total value exceeding $2.8B, with high precision, as determined from a random sample.

    MadMax is unique in the landscape of smart contract analyzers and verifiers. It is an approach employing cutting-edge declarative static analysis techniques (e.g., context- ensitive flow analysis and memory layout modeling for data structures), whereas past analyzers have primarily focused on lightweight static analysis, on symbolic execution, or on fullfledged verification for functional correctness.

    As MadMax demonstrates, static program analysis offers a unique combination of advantages: very high scalability (applying to the entire blockchain) and high coverage of potential vulnerabilities. Additionally, MadMax is raising the level of abstraction of automated security analysis, by encoding complex properties (such as “safely resumable loop” or “storage whose increase is caused by public calls”), which, in turn, allow detecting vulnerabilities that span multiple transactions.

    2. BACKGROUND
    A blockchain is a shared, transparent distributed ledger of transactions that is secured using cryptography. One can think of a blockchain as a long and ever-growing list of blocks, each encoding a sequence of individual transactions, always available for inspection and safe from tampering. Each block contains a cryptographic signature of its previous block. Thus, no previous block can be changed or rejected without also rejecting all its successors. Peers/ miners run a mining client for separately maintaining the current version of the blockchain. Each of the peers considers the longest valid chain starting from a genesis block to be the accepted version of the blockchain.

    To encourage transaction validation by all peers and discourage wasted or misleading work, a blockchain protocol typically combines two factors: an incentive that is given as a reward to peers successfully performing validation, and a proof-of-work, requiring costly computation to produce a block. To see how distributed consensus and permanent record-keeping arise, consider a malicious client who tries to double-spend a certain amount. The client may propagate conflicting transactions (e.g., paying sellers A and B) to different parts of the network. As different peers become aware of the two versions of the truth, a majority will arise, because the peers will build further blocks over the version they perceived as current. Thus, a majority will soon accept one of the two spending transactions as authoritative and will reject the other.



    Access the full document

  • Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities

    Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities

    Lexi Brent∗
    Int’l Computer Science Institute Berkeley, CA, USA
    lexi@icsi.berkeley.edu

    Neville Grech
    University of Athens Athens, Greece
    me@nevillegrech.com

    Sifis Lagouvardos
    University of Athens Athens, Greece
    sifis.lag@di.uoa.gr

    Bernhard Scholz
    University of Sydney Sydney, NSW, Australia
    bernhard.scholz@sydney.edu.au

    Yannis Smaragdakis
    University of Athens Athens, Greece
    yannis@smaragd.org

    Abstract
    Smart contracts on permission less blockchains are exposed to inherent security risks due to interactions with untrusted entities. Static analyzers are essential for identifying security risks and avoiding millions of dollars worth of damage. We introduce Ethainter, a security analyzer checking information flow with data sanitization in smart contracts. Ethainter identifies composite attacks that involve an escalation of tainted information, through multiple transactions, leading to severe violations. The analysis scales to the entire blockchain, consisting of hundreds of thousands of unique smart contracts, deployed over millions of accounts. Ethainter is more precise than previous approaches, as we confirm by automatic exploit generation (e.g., destroying over 800 contracts on the Ropsten network) and by manual inspection, showing a very high precision of 82.5% valid warnings for end-to- nd vulnerabilities. Ethainter’s balance of precision and completeness offers significant advantages over other tools such as Securify, Securify2, and teEther.

    CCS Concepts: · Software and its engineering → Formal software verification; · Theory of computation →Program analysis.
    Keywords: static analysis, information flow, smart contracts

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. PLDI ’20, June 15ś20, 2020, London, UK © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7613-6/20/06. . . $15.

    ACM Reference Format:
    Lexi Brent, Neville Grech, Sifis Lagouvardos, Bernhard Scholz, and Yannis Smaragdakis. 2020. Ethainter: A Smart Contract Security Analyzer for Composite Vulnerabilities. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI ’20), June 15ś20, 2020, London, UK. ACM, New York

    Introduction
    Permissionless blockchain platforms such as Bitcoin [30] and Ethereum [6, 42] promise to revolutionize all sectors of multiparty interaction, by enabling decentralized, resilient consensus. The Ethereum platform, in particular, allows the execution of arbitrary programs, which are called smart contracts. Smart contracts are registered immutably on the blockchain and operate autonomously. Software correctness in smart contracts is critical, as (1) the contracts are high-value targets (since they manage monetary assets), (2) are immutable and cannot be patched, and (3) are fully available for inspection (and invocation) by potential attackers. The need for contract correctness has gained prominence via several recent high profile incidents resulting in losses of cryptocurrency amounts valued in the hundreds of millions [9, 41]. Smart contracts are Turing-complete programs, typically expressed in a domain-specific language called Solidity [40]. Solidity contracts are compiled to low-level bytecode and executed by the Ethereum Virtual Machine (EVM) on the blockchain. Smart contracts store data either on the blockchain or in execution-ephemeral memory. Solidity programs implement unconventional data structures using cryptographic hashing, posing a challenge for static bytecode analyzers. Furthermore, EVM bytecode does not explicitly expose control flow, necessitating sophisticated decomplication methodologies [5, 13, 24, 44]. Hence, checking for even basic security vulnerabilities is non-trivial, as security analyzers for smart contracts require complex techniques to overcome these challenges [14, 37].

    With the rising awareness of Ethereum security risks and the development of recommended practices, realistic attacks have become increasingly complex. They often chain successive exploits that each expose more vulnerabilities. A new class of security analyzers is required to identify such composite attacks. The most notorious (albeit relatively simple) example of a composite attack is the Parity wallet hack, where the equivalent of $280M [9] was stolen or frozen via two separate vulnerabilities. The attack involved reinitializing part of the contract that sets owner variables, via a vulnerable library function, prior to attacking the contract.

    To safeguard against sophisticated attacks, it is essential to model the information flow [34] inside smart contracts accurately. Information flow classifies information into trusted and untrusted and considers how it propagates in the code. For example, if untrusted (łtaintedž) information from external sources is allowed to spread to critical program points, it can alter key aspects of program behavior.

    Smart contract programmers employ guarding patterns to prevent an untrusted entity from executing sensitive operations, such as destroying the contract. For example, the contract code could check whether the user of the contract (a.k.a., its caller or sender) has specific privileges. The simplest check is to permit only a particular contract owner account to perform critical actions. For security analyzers, it is essential to model this guarding mechanism, for any highfidelity information flow analysis: if guards are not taken into consideration, many contracts will be incorrectly inferred to be vulnerable, when the only user that can łexploitž these vulnerabilities would be the owner of the contract themselves. Conversely, the guarding code can itself be attacked, by invoking code that manipulates the state of the contract in unexpected ways.

    In this work, we present Ethainter: the first security analyzer for detecting composite information flow violations in Ethereum smart contracts. Ethainter enhances the understanding of tainted information flow with the tainting of guard conditions, other potential ineffectiveness of guards, as well as different kinds of taint that are not prevented by guarding. Ethainter models not just the existence of guards but also whether they are effective in sanitizing information. The specification of the analysis is in a formalism of independent value. It ignores orthogonal, well-understood technical complexities, capturing, instead, the essence of new information flow concepts (e.g., tainted guards, taint through storage) in minimal form. For example, we consider a tainted memory store to induce further information flow more eagerly than an untainted (yet still statically undetermined) memory store. The precision, recall, and scalability of Ethainter are highly fine-tuned using mutually-recursive Datalog rules for tainting and overall information flow. The contribution of our work is as follows:



    Access the full document

  • Gigahorse: Thorough, Declarative Decompilation of Smart Contracts

    Gigahorse: Thorough, Declarative Decompilation of Smart Contracts

    Neville Grech
    University of Athens and University of Malta Greece and Malta
    me@nevillegrech.com

    Lexi Brent
    The University of Sydney Australia
    lexi.brent@sydney.edu.au

    Bernhard Scholz
    The University of Sydney Australia
    bernhard.scholz@sydney.edu.au

    Yannis Smaragdakis
    University of Athens Greece
    yannis@smaragd.org

    Abstract—The rise of smart contracts—autonomous applications running on blockchains— as led to a growing number of threats, necessitating sophisticated program analysis. However, smart contracts, which transact valuable tokens and cryptocurrencies, are compiled to very low-level bytecode. This bytecode is the ultimate semantics and means of enforcement of the contract. We present the Gigahorse toolchain. At its core is a reverse compiler (i.e., a decompiler) that decompiles smart contracts from Ethereum Virtual Machine (EVM) bytecode into a highlevel 3-address code representation. The new intermediate representation of smart contracts makes implicit data- and control flow dependencies of the EVM bytecode explicit. Decompilation obviates the need for a contract’s source and allows the analysis of both new and deployed contracts.


    Gigahorse advances the state of the art on several fronts. It gives the highest analysis precision and completeness among decompilers for Ethereum smart contracts—e.g., Gigahorse can decompile over 99.98% of deployed contracts, compared to 88% for the recently-published Vandal decompiler and under 50% for the state-of-the-practice Porosity decompiler. Importantly, Gigahorse offers a full-featured toolchain for further analyses (and a “batteries included” approach, with multiple clients already implemented), together with the highest performance and scalability. Key to these improvements is Gigahorse’s use of a declarative, logic-based specification, which allows high-level insights to inform low-level decompilation. Index Terms—Ethereum, Blockchain, Decompilation, Program
    Analysis, Security


    INTRODUCTION
    Distributed blockchain platforms have captured the imagination of scientists and the public alike. Blockchain technology offers decentralized consensus mechanisms for any transactions that, in the past, would have required a trusted centralized authority. One of the most evident embodiments of this vision is the development of smart contracts: Turing-complete autonomous agents that run on distributed blockchains, such as Ethereum or Cardano. A smart contract may, for instance, implement a lending policy, a charging scheme for digital goods, an auction, the full set of operations of a bank, and virtually any other logic governing multi-party transactions. Ethereum is the best-known, most popular blockchain platform that supports full-featured smart contracts. (As of this writing, the Ethereum cryptocurrency market capitalization is $13B.) Ethereum offers an excellent demonstration of the potential for smart contracts, as well as their technical challenges. Developers typically write smart contracts in a high-level language called Solidity, which is compiled into immutable low-level Ethereum VM (EVM) bytecode for the blockchain’s distributed virtual machine.


    The open nature of smart contracts, as well as their role in handling high-value currency, raise the need for thorough contract analysis and validation. This task is hindered, however, by the low-level stack-based design of the EVM bytecode that has hardly any abstractions as found as in other languages, such as Java’s virtual machine. For example, there is no notion of functions or calls—a compiler that translates to EVM bytecode needs to invent its own conventions for implementing local calls over the stack. It is telling that recent research [1], [9], [22], [34] has focused on decompiling smart contracts into a higher-level representation, before applying any further (usually securityoriented) analysis. Past decompilation efforts have been, at best, incomplete. The best-known decompiler (largely defining the state-of-the-practice) is Porosity [33], which in our study fails to yield results for 50% of deployed contracts of all smart contracts on the block chain. Upcoming research tools including the Vandal decompiler [35] still fail to decompile a significant portion of real contracts (around 12%) due to the complex task of converting EVM’s stack-based operations to a register-based intermediate representation.


    Such difficulties are much more than technicalities of the platform or idiosyncrasies of existing tools. Any current or future smart contract platform is likely to employ virtual machines that are low-level. The designs of these virtual machines are optimized for massively replicated execution of smart contracts. The bytecode effectively represents an assembly language designed for efficient execution and compact program representation, since the bytecode must be stored on the blockchain. Hence, the bytecode for smart contracts will never be optimal for human readability or reverse compilation. For effective decompilation, significant program comprehension of the bytecode is necessary. A decompiler requires deep program understanding before it can reconstruct the bytecode to a high-level representation. For instance, to recognize which low-level jumps correspond to high-level function calls, a decompiler must deduce possible addresses for jump instructions to be able to reconstruct the control-flow of a smart contract. Such understanding is compounding: once calls are recognized as calls (and not just as a mere intra-procedural
    jump), its decompilation precision can further improve, by pruning impossible targets.

    In this paper, we introduce the Gigahorse toolchain for analysis of Ethereum smart contracts. Gigahorse addresses the above challenges, making the following contributions:

    • It offers a highly-effective decompiler, yielding higher precision and completeness than the state-of-the-art (e.g., decompiling virtually all existing contracts on the Ethereum blockchain).
    • Gigahorse anchors around it a full-featured tool suite, offering libraries for building analyses, as well as readymade clients for existing analyses.
    • The Gigahorse approach provides new decompilation insights (possibly of value for many more platforms): As higher level program features are discovered, they feed back to lower level analyses. E.g., by discovering functions, stack analyses are performed locally (more precisely), and the effects of function calls on the stack are also summarized, enabling both more precise and more scalable analysis.
    • Gigahorse showcases an unconventional decompilation approach, which enables the above benefits: the decompiler is specified declaratively, using logic-based (Datalog) rules.
    • Gigahorse is evaluated and its features illustrated on the full set of smart contracts of the Ethereum blockchain. Gigahorse is powering the ongoing free Contract Library service (https://contract-library.com), which offers decompiled versions of all contracts on the Ethereum blockchain. Contract Library is a valuable service for Ethereum security analysts, and is currently receiving several tens of unique visitors and over a thousand page views per day.

    BACKGROUND
    We next present some background on the EVM bytecode language and declarative static analysis. A. Low-Level Bytecode in the Ethereum Block



    Access the full document