Publications

(2020). Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift. ACM Transactions on Architecture and Code Optimization, ACM TACO.

Cite

(2020). High-level hardware feature extraction for GPU performance prediction of stencils. 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit (GPGPU@PPoPP).

Cite

(2020). Generating fast sparse matrix vector multiplication from a high level generic functional IR. 29th International Conference on Compiler Construction (CC@CGO).

Cite

(2020). Automatic generation of specialized direct convolutions for mobile GPUs. 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit (GPGPU@PPoPP).

Cite

(2019). Position-dependent arrays and their application for high performance code generation. Proceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC@ICFP).

Cite

(2019). High-level synthesis of functional patterns with Lift. Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY@PLDI).

Cite

(2018). High performance stencil code generation with Lift. Proceedings of the 16th ACM/IEEE International Symposium on Code Generation and Optimization (CGO).

Cite

(2018). Bulk-synchronous parallel simultaneous BVH traversal for collision detection on GPUs. Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (i3D).

Cite

(2018). Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach. Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Cite

(2018). Accelerated Finite State Machine Test Execution Using GPUs. 25th Asia-Pacific Software Engineering Conference (APSEC).

Cite

(2018). A Modular Approach to Performance, Portability and Productivity for 3D Wave Models . 7th International Workshop on Domain Specific Languages and High-level Frameworks for High Performance Computing (WOLFHPC).

Cite

(2017). Strategy Preserving Compilation for Parallel Functional Code. CoRR.

PDF Cite

(2017). Performance Portability For Room Acoustics Simulations. Proceedings of the 20th International Conference on Digital Audio Effects (dafx17).

Cite

(2017). ParTeCL: parallel testing using OpenCL. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA-DEMOS).

Cite

(2017). OpenCL JIT Compilation for Dynamic Programming Languages. MoreVMs Workshop. Collocated with Programing (MoreVM).

Cite

(2017). Lift: A Functional Data-Parallel IR for High-Performance GPU Code Generation. Proceedings of the 15th ACM/IEEE International Symposium on Code Generation and Optimization (CGO).

Cite

(2017). Just-in-time gpu compilation for interpreted languages with partial evaluation. Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE).

Cite

(2017). Compiler-assisted test acceleration on gpus for embedded software. Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).

Cite

(2017). A Study of Dynamic Phase Adaptation Using a Dynamic Multicore Processor. ACM Transactions on Embedded Computing Systems (Special Issue CASES 2017), ACM TECS.

Cite

(2016). Selecting Heterogeneous Cores for Diversity. ACM Transactions on Architecture and Code Optimization, ACM TACO.

Cite

(2016). Performance Portable GPU Code Generation for Matrix Multiplication. Proceedings of the 2016 Workshop on General Purpose Processing on Graphics Processing Units (GPGPU).

Cite

(2016). Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. Proceedings of the 2016 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

Cite

(2016). Four Metrics to Evaluate Heterogeneous Multicores. ACM Transactions on Architecture and Code Optimization, ACM TACO.

Cite

(2016). Four Metrics to Evaluate Heterogeneous Multicores. ACM Transactions on Architecture and Code Optimization, ACM TACO.

Cite

(2016). Compositional Compilation for Sparse, Irregular Data Parallelism. Proceedings of the 2016 Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU).

Cite

(2016). A Machine Learning Approach to Mapping Streaming Workloads to Dynamic Multicore Processors. Proceedings of the 17th ACM SIGPLAN/SIGBED conference on Languages, Compilers and Tools for Embedded Systems (LCTES).

Cite

(2015). Runtime Code Generation and Data Management for Heterogeneous Computing in Java. Proccedings of the 12th International Conference on Principles and Practice of Programming on the Java Platform: Virtual machines, languages, and tools (PPPJ).

Cite

(2015). Patterns and Rewrite Rules for Systematic Code Generation (From High-Level Functional Patterns to High-Performance OpenCL Code). arXiv Technical Report arXiv:1502.02389.

PDF Cite

(2015). Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code. Proceedings of the 20th ACM SIGPLAN International Conference on Funcational Programming (ICFP).

Cite

(2015). Diversity: A Design Goal for Heterogeneous Processors. IEEE Computer Architecture Letters, IEEE CAL.

Cite

(2015). Carpet Unrolling Descriptors for Character Control On Uneven Terrain. Proccedings of 8th the ACM SIGRAPH Motion in Games Conference (MIG).

Cite

(2014). Measuring flexibility in single-ISA heterogeneous processors. Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT).

PDF Cite

(2014). Exploiting gpu hardware saturation for fast compiler optimization. Proceedings of Workshop on General Purpose Processing Using GPUs (GPGPU).

PDF Cite

(2014). Community-driven reviewing and validation of publications. Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering (TRUST).

PDF Cite

(2014). Automatic optimization of thread-coarsening for graphics processors. Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT).

PDF Cite

(2014). A Composable Array Function Interface for Heterogeneous Computing in Java. Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY).

PDF Cite

(2013). An example conference paper. In ICW.

PDF Cite

(2013). Dynamic microarchitectural adaptation using machine learning. ACM Transactions on Architecture and Code Optimization, ACM TACO.

PDF Cite

(2013). A large-scale cross-architecture evaluation of thread-coarsening. Proceedings of the 2013 Conference on High Performance Computing Networking, Storage and Analysis (SC).

PDF Cite

(2012). Exploring and predicting the effects of microarchitectural parameters and compiler optimizations on performance and energy. ACM Transactions on Embedded Computing Systems, ACM TECS.

PDF Cite

(2012). Compiling a High-Level Language for GPUs (via Language Support for Architectures and Compilers). Proceedings of the 33rd ACM SIGPLAN Symposium on Programming Language Design and Implementation (PLDI).

PDF Cite

(2011). An empirical architecture-centric approach to microarchitectural design space exploration. IEEE Transactions on Computers, IEEE TC.

PDF Cite

(2010). A Predictive Model for Dynamic Microarchitectural Adaptivity Control. Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

(2009). Rapid Early-Stage Microarchitecture Design Using Predictive Models. Proceedings of the 2009 IEEE International Conference on Computer Design (ICCD).

PDF Cite

(2009). Portable compiler optimisation across embedded programs and microarchitectures using machine learning. Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

(2008). Exploring and predicting the architecture/optimising compiler co-design space. Proceedings of the 2008 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

PDF Cite

(2007). Microarchitectural Design Space Exploration Using An Architecture-Centric Approach. Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

PDF Cite

(2007). Fast compiler optimisation evaluation using code-feature based performance prediction. Proceedings of the 4th International Conference on Computing Frontiers (CF).

PDF Cite

(2006). Automatic performance model construction for the fast software exploration of new hardware designs. Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

PDF Cite

(2005). Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines. Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

PDF Cite