Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
A survey of machine learning for big code and naturalness
Allamanis M., Barr E., Devanbu P., Sutton C.  ACM Computing Surveys 51 (4): 1-37, 2018. Type: Article
Date Reviewed: Nov 4 2021

There is a rising demand for effective software tools that can help developers build reliable and maintainable software systems. There has been abundant research to help developers track bugs and verify program properties and refactor code. Recently, widely used open-source projects have been made available to the public, with not only the source code but also additional important metadata like commit logs, bug fix summaries, authorship details, and process documents. This whole collection (popularly referred to as “big code”) has spearheaded a new research direction to aid software development and maintenance, based on a data-driven approach to analyze programs and uncover common software characteristics.

The authors study the available literature on probabilistic machine learning and natural language processing (NLP) models for the code and associated metadata (big code), mostly in three areas:

(1) Code generating models focus on modeling how a code is written, to subsequently learn a distribution and generate code to be used in various applications like code migration, pseudocode generation, code synthesis, and code completion. For this, researchers have developed language models, machine translation models, and multi-modal models using the structure of a programming language along with its correlation to metadata, for example, comments, commits, and design documents.
(2) Representational models learn intermediate characterizations of code constructs and their relation and properties, mostly based on a distributed representation of the same in a vector space, coupled with structured predictions using sequence models. This representation helps in program analysis, feature location, code search, and data and control traceability.
(3) Pattern mining models are used to mine resolvable patterns from source code and mostly help with code summarization, documentation generation, and bug fixing.

The authors review around 200 papers that aim to develop probabilistic models of code and use it effectively in constructing software. The major applications of these models are to enable code auto completion and migration, infer coding conventions, mine code defects, and facilitate code translation and copying.

Reviewer:  Partha Pratim Das Review #: CR147381
Bookmark and Share
Learning (I.2.6 )
Document Management (I.7.1 ... )
Artificial Intelligence (I.2 )
Software Engineering (D.2 )
Would you recommend this review?
Other reviews under "Learning": Date
Simplicity is best: addressing the computational cost of machine learning classifiers in constrained edge devices
Gómez-Carmona O., Casado-Mansilla D., López-de-Ipiña D., García-Zubia J.  IoT 2019 (Proceedings of the 9th International Conference on the Internet of Things, Bilbao, Spain,  Oct 22-25, 2019) 1-8, 2019. Type: Proceedings
Jul 7 2021
Deep learning for medical decisions
Kose U., Deperlioglu O., Alzubi J., Patrut B.,  Springer International Publishing, New York, NY, 2020. 189 pp. Type: Book (978-9-811563-24-9)
Jun 30 2021
Automated machine learning: methods, systems, challenges
Hutter F., Kotthoff L., Vanschoren J.,  Springer International Publishing, New York, NY, 2019. 219 pp. Type: Book (978-3-030053-17-8)
Jun 14 2021

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2021 ThinkLoud, Inc.
Terms of Use
| Privacy Policy