Adobe has made an open source tool available, designed to identify randomly generated strings in any plain text.
Dubbed Stringlifier, the method was written in Python and leverages machine learning to discern random sequences of characters from regular sequences.
The open-source project should be helpful in analyzing security and code logs, or in trying to discover passwords that may have been leaked accidentally.
Whether it’s hashes, API keys, randomly generated passwords, or other random string types in source code, logs, or configuration files, Stringlifier can help identify them easily.
Adobe’s public GitHub repository has released the source code for Stringlifier, but the software giant has made available a “pip” (Python package installer) installation package with a pre-trained model included.
Apart from the open-source utility called Tripod, Adobe claims it has already used the tool to identify random strings when looking for anomalies in datasets.
The team used various pre-process approaches and transformed long strings into numerical form, but these approaches hit a roadblock when finding random strings, disrupting the algorithm for clustering.
By substituting < RANDOM STRING > for all random character sequences, the team was able to group similar types of command lines easier, even if they used random hashes in their parameters.
“We hope you find useful stringlifier. The entire source code can be found in the GitHub repository of Adobe. In that archive, you can also find all our other open source projects from across Adobe’s security teams. We look forward to receiving reviews, and are always welcome to contribute, “says Adobe.
The company also provides details on how to get started using Stringlifier and how users can train their own models while looking for different types of strings to be identified.