Recently in software development, due to the increase in scale and complexity, thousands of developers and dozens of company are usually involved in developing a huge software system. In this research group, we aim at mitigating such complexity and exploring solutions that support developers for smoother development process.
Software Repository refers to a development infrastructure that allows developers to collaborate among others for software development via network. It is also a collective term of version control system that manages source code, as well as a mailing list management system that keeps track of discussions among developers. Within software repository, immensive amount of development history data is stored. By exploiting these data, the goal of this research field is to perform integration that filters out irrelevant information and summarizes the commons. Furthermore, it also aims at providing developers with useful information in a comprehensible way. Such activities are called as Software Repository Mining.
Software analytics aims at helping individual or team to make better decision in software development. For this reason, software-related data such as source code or development history is analyzed. Afterwards, the corresponding result is provided in a way that can be shared among individual or team.
Recently, in the rapidly emerging cloud computing technology, dynamically and automatically constructing as well as allocating computing environment by virtualizing computing resource based on software technology has been gathering attentions. In our research, we aim at manipulating computing resource by utilizing software technique. Specifically, Software Defined Networking (SDN) which is a software-based networking-control technique is in our main focus. Based on SDN, we hope to extend and advance software techniques that can support cloud computing-related technologies such as Cloud Gaming, Big Data Analysis, Machine Learning, IoT and etc.
High-Performance Computing Systems (i.e., Supercomputers) provide massive computing performance to solve diverse problems in science and engineering. At our laboratory, we research and develop software to fully utilize the performance of state-of-the-art HPC systems. We develop massively parallel scientific applications in close collaboration with domain scientists. Furthermore, we design and implement middleware for large-scale HPC systems.
Individual research projects are shown below. You can filter the projects with keywords.
Software Design & Analysis of Software Process Recently in software development, due to the increase in scale and complexity, thousands of developers and dozens of company are usually involved in developing a huge software system. In this research group, we aim at mitigating such complexity and exploring solutions that support developers for smoother development process. Our current research topics A Planning Support System for Quantitative Management of Software Development Process improvement in software development refers to discovering problems and providing corresponding solutions in the development process.
Treasure digging in software development history – Repository Mining In this research, we aim at supporting software comprehension. By analyzing and arranging software development history, we propose methods to present the corresponding data. Especially, we focus on software repository which is our major data source. Software Repository refers to a development infrastructure that allows developers to collaborate among others for software development via network. It is also a collective term of version control system that manages source code, as well as a mailing list management system that keeps track of discussions among developers.
What is Software Analytics Software analytics aims at helping individual or team to make better decision in software development. For this reason, software-related data such as source code or development history is analyzed. Afterwards, the corresponding result is provided in a way that can be shared among individual or team. Reference Work:Tim Menzies, Thomas Zimmermann. “Software Analytics: So What?”. In Journal IEEE Software archive Volume 30 Issue 4, July 2013 Pages 31-37.
Recently, in the rapidly emerging cloud computing technology, dynamically and automatically constructing as well as allocating computing environment by virtualizing computing resource based on software technology has been gathering attentions. In our research, we aim at manipulating computing resource by utilizing software technique. Specifically, Software Defined Networking (SDN) which is a software-based networking-control technique is in our main focus. Based on SDN, we hope to extend and advance software techniques that can support cloud computing-related technologies such as Cloud Gaming, Big Data Analysis, Machine Learning, IoT and etc.
Overview High-Performance Computing Systems (i.e., Supercomputers) provide massive computing performance to solve diverse problems in science and engineering. At our laboratory, we research and develop software to fully utilize the performance of state-of-the-art HPC systems. We develop massively parallel scientific applications in close collaboration with domain scientists. Furthermore, we design and implement middleware for large-scale HPC systems. Research Topics Dynamic Interconnect Control using SDN Modern HPC systems are based on the cluster architecture where multiple computers are connected through a high-performance network (i.
Software Defined Network (SDN) is an another approach to networking that realizes network programmability and dynamic control over the network. OpenFlow is a de facto standard protocol used to implement an SDN. However, understanding the dynamic behavior of an OpenFlow network is challenging since the information about the operation is distributed across numerous network switches. Opimon (OpenFlow Interactive Monitoring) is developed for monitoring and visualizing the network topology and flow
Federated learning is a technique for training machine learning models while keeping data private. The data is kept on the devices or servers where it was collected, and only updates to the model are shared between the parties involved in the training process. This approach is effective in addressing privacy concerns and allows for a diverse dataset, with data coming from multiple sources. However, resource constraints on edge devices make it difficult for all user devices to use the same model, which could impact the model’s accuracy.
Virtual Machines (VMs) are used extensively in the cloud. The underlying hypervisors allow hardware resources to be split into multiple virtual units which enables server consolidation, fault containment and resource management. However, VMs with traditional architecture introduce heavy overhead and reduce application performance. Containers are becoming popular options for running applications, yet such a solution raises security concerns due to weaker isolation than VMs. We are at the point of container and traditional virtualization convergence where lightweight hypervisors are implemented and integrated into the container ecosystem in order to maximize the benefits of VM isolation and container performance.
To fully utilize multi-core processors, the development of parallel programs is essential. However, creating parallel programs is a challenging and complex task. Automatic parallelization techniques have been extensively studied to simplify this process by transforming sequential code into parallel code automatically. Most existing automatic parallelization tools rely on static analysis, which can identify certain types of parallel structures but fails to detect others, resulting in suboptimal performance gains. In contrast, the recent emergence of Large Language Models (LLMs) in the field of Natural Language Processing (NLP) has inspired software engineering researchers to adopt these models.
Federated learning trains a model on a centralized server using datasets distributed over a large number of edge devices. Applying federated learning ensures data privacy because it does not transfer local data from edge devices to the server. Existing federated learning algorithms assume that all deployed models share the same structure. However, it is often infeasible to distribute the same model to every edge device because of hardware limitations such as computing performance and storage space.
Running neural network models on edge devices is attracting much attention by neural network researchers since edge computing technology is becoming more powerful than ever. However, deploying large neural network models on edge devices is challenging due to the limitation in available computing resources and storage space. Therefore, model compression techniques have been recently studied to reduce the model size and fit models on resource-limited edge devices. Compressing neural network models reduces the size of a model, but also degrades the accuracy of the model since it reduces the precision of weights in the model.
Data centers are centralized facilities where computing and networking hardware are aggregated to handle large amounts of data and computation. In a data center, computing resources such as CPU and memory are usually managed by a resource manager. The resource manager accepts resource requests from users and allocates resources to their applications. A commonly known problem in resource management is that users often request more resources than their applications actually use.
Background In recent years, Cloud Gaming, also regarded as gaming on demand, is an emerging gaming service that envisions a promising future of providing million clients with novel and highly accessible gaming experiences, as it has been an active topic both in industries and research fields recently. Compared to Online Game which only stores game status, Cloud Gaming takes more advantages of cloud infrastructures by leveraging reliable, elastic and high-performance computing resources.
A large amount of scientific data needs to be transferred from one site to another as fast as possible in the computational science fields. High-speed data transfer between sites is very important, especially in the Grid computing field; GridFTP has been widely used for bulk data transfer over a wide area network. GridFTP achieves greater performance by supporting parallel TCP streams. Using parallel TCP streams improves the throughput of slow-start algorithms and lossy networks even on a single path.
Bandwidth and latency are two major factors that contribute the most to network application performance. Between each pair of switches in a network, there may be multiple paths connecting them. Each path has different proper- ties because of multiple factors. Traditional shortest-path routing does not take this knowledge into consideration and may result in sub-optimal performance of applications and underutilization of the network. We propose a concept of “bandwidth and latency aware routing”.
Multipath TCP (MPTCP) is an extension to TCP that allows multiple TCP subflows to be created from a single application socket. This is done automatically by operating system kernel implementation, such as MPTCP Kernel. MPTCP has advantages over network layer and application layer multipathing. This is because MPTCP, unlike network layer mechanisms such as Equal-Cost Multipathing (ECMP), is capable of independent congestion control on multiple paths and therefore works well in unequal networks, which is the situation of PRAGMA-ENT.