Bogdan Vasilescu

Carnegie Mellon University

What Can Analyzing Tens of Terabytes of Public Trace Data Tell us About Open Source Sustainability?

Open-source communities face significant sustainability challenges, from attracting and retaining a diverse set of contributors to fundraising. Through interviews, surveys, and analysis of billions of commits and other public traces, organization, functioning, and overall health of open-source communities were studied. The talk will highlight what the empirical evidence is for a range of research questions about non-technical issues, including: project-level risk factors associated with upstream and downstream dependencies, the value of diversity in open-source teams, factors contributing to longer-term engagement or premature disengagement of contributors, the effectiveness of donations as a funding model, and the role of transparency and signaling in increasing the health of open-source projects.

Bogdan Vasilescu is mostly active in the Software Engineering research community, where he have co-chaired the MSR 2020 Data Showcase, has been serving on Program Committees for the major Software Engineering venues (including ICSE, FSE, and ASE). He is an Associate Editor for the ACM Transactions on Software Engineering and Methodology and is co-chairing the SIGSOFT Initiative on Data-driven Introspection, among others.


Fernando Castor

Universidade Federal de Pernambuco

Program Understanding as a Learning Activity

Reading code is an essential activity in software maintenance and evolution. Several studies with developers have investigated how different factors, such as the employed code constructs and naming conventions, can impact code readability, i.e., what makes a program easier or harder to read and apprehend by developers, and code legibility, i.e., what influences the ease of identifying elements of a program. These studies evaluate readability and legibility by means of different comprehension tasks and response variables. This talk will examine these tasks and variables in studies aiming to compare programming constructs, coding idioms, naming conventions, and formatting guidelines, e.g., recursive code vs. iterative code. To that end, a systematic literature review was conducted where 54 relevant papers were found. It was found out that most of these studies evaluate code readability and legibility by measuring the correctness of the subjects’ results (83.3%) or simply asking their personal opinions (55.6%). Some studies (16.7%) rely exclusively on the latter response variable. Also, there are still relatively few studies that monitor developer's physical signs, such as brain activation regions (5%). The study shows that attributes such as time and correctness are multi-faceted. For example, correctness can be measured as the ability to predict the output of a program, answer questions about its general behavior, precisely recall specific parts, among other things. These results make it clear that different evaluation approaches require different competencies from study subjects, e.g., tracing the program vs. summarizing its goal vs. memorizing its text. To assist researchers in the design of new studies and improve comprehension of existing ones, program comprehension is modeled as a learning activity by adapting a preexisting learning taxonomy. This adaptation indicates that some competencies, e.g., tracing, are often exercised in these evaluations whereas others, e.g., relating similar code snippets, are rarely targeted.

Fernando Castor is, since December 2008, a professor at the Informatics Center of the Federal University of Pernambuco (Assistant Professor, 2008-2016, Associate Professor, 2016-). He is, since 2009, a researcher of the National Council of Scientific and Technological Development (CNPq), modality PQ2, and has been the principal investigator of six research projects funded by Brazilian research agencies. Fernando has supervised more than 20 MSc and PhD students. His research activity is mainly located within the areas of Software Engineering and Programming Languages and seeks to identify efficient ways to develop software that behaves efficiently. Over the past five years, his research has focused on the energy efficiency of software systems, in particular on how exploiting the design diversity of preexisting software components can be an inexpensive way to save energy. Furthermore, he has also been investigating how constructs for concurrency control and parallel execution management impact attributes such as performance, energy efficiency, and ease of maintenance. His research is in part experimental and in part based on the analysis of large scale open source code repositories. The developments achieved as a result of his research activity have been described in more than 100 scientific publications in some of the best conferences and journals in the aforementioned scientific areas. Examples include the ACM/IEEE International Conference on Software Engineering, the ACM SIGSOFT Symposium on Foundations of Software Engineering, the ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications, the International Conference on Mining Software Repositories, and the Communications of the ACM. His publications appear to have had non-negligible impact, as evidenced by the more than 2,000 citations they have received (Google Scholar).