Achieving efficient execution of programs on modern computers requires present approaches to alleviate some of the burden in managing the data locality and parallelism in the application. A program written in a form that can be blocked into coarser operations is reorganized through a combination of empirical and model-driven optimization during program installation, compilation, and execution. The tools employed vary widely and depend on the application domain of interest. I will present optimization techniques for matrix transposition, automatic parallelization of stencil codes, and load-balanced execution of tensor contraction expressions.