Skip to end of metadata
Go to start of metadata

documentation copied from https://wiki.brandeis.edu/twiki/bin/view/Bio/IntelCCompiler ; admin notes are at ICC License Manager Administration

Why Use ICC?

I saw a roughly 400% performance gain running my simulations with icc versus gcc. My officemate saw 200%.

Available Tools

  • Intel(R) C++ Compiler for Linux
  • Intel(R) Integrated Performance Primitives for Linux
  • Intel(R) Math Kernel Library for Linux
  • Intel(R) Threading Building Blocks for Linux

These tools run on Linux and create Linux executables.

License Terms

We have a two-user concurrent license. Since compiling only takes a few minutes, this should not be a major problem.

Using icc

Preferred: If you have a login to cluster (hpc) you can call the compiler via '/share/apps/intel/cc/10.1.013/bin/icc'.

  • no longer at that location. Search around under /share/apps/intel
[karel@login-0-0 intel64]$ find /share/apps/intel -name icc -ls 
14387256    4 -rwxr-xr-x   1 root     root         2012 Feb 25  2009 /share/apps/intel/Compiler/11.0/081/bin/ia32/icc
14750688    4 -rwxr-xr-x   1 root     root         2041 Feb 25  2009 /share/apps/intel/Compiler/11.0/081/bin/intel64/icc
13010686    4 -rwxr-xr-x   1 root     root         2041 Jan  9  2009 /share/apps/intel/Compiler/11.0/074/bin/intel64/icc

Less Preferred: If you want to install the compiler/tools on a different computer on campus, please email me at elrad(at)univ-name(dot)edu

Easy Ways to get the most performance out of icc

Here are a few basic tips (please feel free to add them) when optimizing with icc

  • Use the '-O3 -static -ipo ' compiler options
    • This will enables (among other things): aggressive inlining, aggressive prefetching, inter-procedural optimization, omit frame pointers, use GPRs and static-only linking
  • The default floating point model with -O3 is 'fast' which enables aggressive optimizations that may lead to numerical instability
    • You may manually override the floating point model by specifying '-fp-model model ' where model is (in order of decreasing speed and increasing accuracy
      • fast2 / fast - aggressive (fast2 is more aggressive than fast)
      • precise - 'value safe' optimizations only - no guaranteed intermediate rounding
      • source - Guarantees every intermediate is rounded to source-defined precision - about 3x slower than precise
  • Target the code at your particular processor class
    • -axT will create two versions of the code, a generic x86 version and a core2 optimized version. The proper branch will be chosen at runtime.
    • -axTPN will create versions for Core2, Core and Pentium IV where there is a performance gain to be had for the branch. The proper branch will be chosen at runtime.
    • -xT will create only the core2 version of the code and will therefore fail to run entirely on older processors.
  • Use the Intel Math Library
    • Functionally interchangeable (same prototypes) with the standard C math library
    • Replace or <math.h> with <mathimf.h>

Harder Ways to get the most out of icc

  • Write vectorizable loops
    • The number of iterations should be a constant over the span of the loop.
    • For instance for(uint i = 0; i < myVec.size(); ++i) is not vectorizable because the compiler does not know that myVector.size() will not be modified during the iterations. Consider for(uint i = 0, s = myVec.size(); i < s; ++i) instead.
  • Write code that is easy to inline
    • Single exit points tends to be easier to inline
    • Break up larger functions into more manageable chunks
  • Use SSE intrinsics where feasible

Debugging

You can create debugger-compatible executable with '-g -O2' (the O2 option will ensure that you are debugging an optimized version that is maximally similar to the production version)

  • idb is provided in /share/apps/intel/idb/
  • -fp-stack-check is a very useful compiler option that will crash as soon as any floating point results in an #INF or #NaN (by default, such operations will silently fail and the crash will happen much later and be harder to diagnose).
  • The debug-compatible executables are compatible with gdb as well.
  • Numerical instability can be cured by using -fp-model options

-- OrenElrad - 19 Mar 2008

  • No labels