EMA

EMA

EMA (standing for EMpirical Analysis of algorithms) is a tool created for providing an effortless way to assess the performance of algorithms. EMA may be handy in many scenarios, including:

  • checking whether the algorithm's usage of resources agrees with its theoretical complexity (for instance, a poor choice of data structures may lead the complexity of an algorithm out of the goal);
  • empirically measuring an algorithm's complexity (which might be unknown analytically);
  • having some nice graphs of the algorithm's performance automatically plotted for you.

The aggregated value of EMA is to automatically deal with many of the operational technical issues that must be addressed in every experimental analysis of algorithms. Among those issues, we might mention:

  • Simulations:
    • Determining the convenient sample of variable values for which to execute the algorithm (the runs should last neither too short, nor too long);
    • Avoiding that the allocated memory grows beyond free memory (otherwise, the costs of OS' memory paging generally breaks the assumption that the RAM model is a good model for the actual machine)
    • Providing a framework for the creation of input instances for the problem and managing the disk usage for storing such inputs. EMA keeps such inputs to be used later in other executions and makes sure that they do not grow beyond a given threshold.
  • Analysis:
    • Choosing/configuring a tool for the presentation of graphs, including the design of those graphs themselves;
    • Choosing/configuring a tool for working out the complexity of the algorithm (potentially applying some machine learning techniques);
    • Compiling some statistics on the best, worst, and average time and space complexities.

Other aspects of EMA you might be interested to know:

  • free software;
  • open source;
  • multi platform;
  • can be easily integrated to any executable program;
  • in beta version; any issue or suggestion related to EMA? Please, let me know.

CHANGE LOG:

3.0:

- overall better accuracy for the BEST-GUESS function.

- fixed getting free memory (in some cases, a wrong number was being determined).

- fixed getting the amount of memory used for a process (taking into account subprocesses now).

- when discretizing parameters, the set of generated discrete values has been enlarged.

- user can determine the family of functions on which the fit will be applied.

- fixed a formatting issue with eps.

- option to select the maximum number of iterations for regression analysis.

- EMA waits until a real core is available to run the algorithm

- Number of logical and physical CPUs are read from the operating system (Linux only)

- EMA reports in screen the usage of each resource right after execution

2.5:

- Fixed standard deviation calculation (divisor from 'N' to 'N-1');

- Outlier detection and configurable removal of mean and standard deviation calculations;

- Added the parameter 'minimum number of samples';

- Convergence Factor can now relate to any resource, not only "Time"

- More function formats added to the fittness procedure.

- More initial values in distinct order of magnitudes (to offer better chances of producing a better fitting).

- In shared computers for running EMA, there will be a maximum number of instances of EMA allowed to concurrently run,

given by max{1,cores/2}.

- More customizations for graphs:

- graph size, line thickness, colors, font family, and customized captions in keys;

- numbers like '1.2e09' now appear as '1.2x10^9'

- plot data within a specific x and/or y range

- application of factors to customized resources values (e.g., to plot them as real values instead of integers)

- encodings for labels

2.4:

- Added support for outputting graphs in eps format.

- Old graph files are backed up whiile outputting new graphs.

- Added support for positioning the key on graphs and changing font family/size on them

- Fixed issues related to running EMA in Windows.

- Reporting execution time on screen with less frequency as the execution time gets longer

2.3:

- Added independent term in a monomial so that the fitting can be more accurate.

- Used allocated memory instead of resident memory as 'used memory' for counting memory consumption (Linux only).

- Fixed terminology (Equation ==> Function)

- The report of functions after estimation can be now requested to be printed out in the console through a

parameter (and the report layout has been improved a little bit).

- Cosmetic changes in the chart layout (colors, line thickness, points, headers).

- getResourceUsageFunction() now can set the error threshold so that a function is considered equivalent to that

of minimum-error.