Normalization is a process of event parsing. Input: one raw event from collector (syslog, log file line, dB table record, etc) Output: one "normal" event
Sometimes it's not enough. You want to ENRICH your event with some data from other sources. GEOIP, data from previously collected events, cmdb data, etc ). For example 4688 in old windows versions does not contain parent process name - only it's PID. But you can keep mapping between pid and name in a table and use it for other events.
Aggregation. Right after normalization you have an option to make a kind of "logical compression" of dense stream of almost identical events. The main case for now is "deflating" network events. For example you can aggregate all netflow events with same proto, src, dst for 5 seconds in one event Input: events sequence Output: one new event and dropped input sequence
Localization is about human readable representation of event in grids. We have a virtual field <text> in our event definition scheme. It doesn't exists in database and generates only for event viewer. So we have localization rules for this. Rule is a combination of filter to define the scope of events to be described with rule and format string (kind of printf()) for combining different attributes into on string. Format string can be different for different languages. What's why we called it localization rules :)
Main use cases: external dictionaries for referencing (IOCs for example) or mutable state for keeping context between different events (remember 4688 above?)
About macroses. This is a way to reuse long and complicated event filters between different rules. Code became more clear, compact and supportable. My advise to you: do not use it at start. This is very optional and mostly for users with hundreds of self-written rules.
yes, you need stop all data collect task and adjust storage volume and it's possible set log rotation service - old logs will be replaced with new ones