Computing Reviews, the leading online review service for computing literature.

Search

A survey of machine learning for big code and naturalness
Allamanis M., Barr E., Devanbu P., Sutton C. ACM Computing Surveys51 (4):1-37,2018.Type:Article

Date Reviewed: Nov 4 2021

There is a rising demand for effective software tools that can help developers build reliable and maintainable software systems. There has been abundant research to help developers track bugs and verify program properties and refactor code. Recently, widely used open-source projects have been made available to the public, with not only the source code but also additional important metadata like commit logs, bug fix summaries, authorship details, and process documents. This whole collection (popularly referred to as “big code”) has spearheaded a new research direction to aid software development and maintenance, based on a data-driven approach to analyze programs and uncover common software characteristics. The authors study the available literature on probabilistic machine learning and natural language processing (NLP) models for the code and associated metadata (big code), mostly in three areas: (1) Code generating models focus on modeling how a code is written, to subsequently learn a distribution and generate code to be used in various applications like code migration, pseudocode generation, code synthesis, and code completion. For this, researchers have developed language models, machine translation models, and multi-modal models using the structure of a programming language along with its correlation to metadata, for example, comments, commits, and design documents. (2) Representational models learn intermediate characterizations of code constructs and their relation and properties, mostly based on a distributed representation of the same in a vector space, coupled with structured predictions using sequence models. This representation helps in program analysis, feature location, code search, and data and control traceability. (3) Pattern mining models are used to mine resolvable patterns from source code and mostly help with code summarization, documentation generation, and bug fixing. The authors review around 200 papers that aim to develop probabilistic models of code and use it effectively in constructing software. The major applications of these models are to enable code auto completion and migration, infer coding conventions, mine code defects, and facilitate code translation and copying.

Reviewer: Partha Pratim Das	Review #: CR147381

Learning (I.2.6 )

Document Management (I.7.1 ... )

Artificial Intelligence (I.2 )

Software Engineering (D.2 )

Would you recommend this review?

yes

Other reviews under "Learning":	Date

Learning in parallel networks: simulating learning in a probabilistic system Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article	Nov 1 1985

Macro-operators: a weak method for learning Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article	Feb 1 1986

Inferring (mal) rules from pupils’ protocols Sleeman D. Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings	Dec 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy