Data Processing

Intro to Data Processing

  1. Definition of Data Processing:

    • Data processing involves transforming collected data into a format usable by your organization. This is crucial because raw data from various sources often comes in different formats and needs to be standardized.

  2. Approaches to Data Processing:

    • Automated Processing: Uses tools like regular expressions, statistical algorithms, machine learning, and natural language processing to handle large volumes of data. These methods help identify patterns, similarities, and relevant information.

    • Human-Based Processing: Relies on human analysts to process data manually. This is essential for tasks that require adaptive reasoning and problem-solving, which machines can't fully replicate. Analysts can provide context and meaning that automated systems might miss.

  3. Stages of Data Processing:

    • Sorting and Filtering (Pre-processing): Cleaning up raw data by removing duplicates, incomplete, or incorrect data. This stage ensures that only high-quality data moves forward.

    • Normalization: Converting data into a standard format suitable for your needs. For example, if you're adding indicators to a watch list, the format should be compatible with your SIEM (Security Information and Event Management) system.

    • Storage and Integration: Storing processed data in a way that it can be easily accessed and integrated with other systems. This stage will be covered in more detail in future videos.

  4. Importance of Data Processing:

    • Processing data is essential to avoid generating duplicate alerts and to make the data usable for threat intelligence. Without processing, it would be challenging to correlate events and make assessments based on raw data alone.

Last updated