Information workers are often faced with this problem: copy and paste many text snippets from many positions in unstructured or semi-structured text files. This is usually too much work to do manually, while there is no easy way to do this automatically. Current solutions include using ancient Unix tools like grep or coding custom tools just for the one job.
This idea is for developing a GUI application which allows users to input sample text positions across multiple files and which generalizes these to the entire collection, going by hints such as common text patterns around the section in question. The user would interactively work with the tool until they are satisfied that they have provided enough information to do the job right. Then the tool would be instructed to run the entire job, which might run for several minutes to completion.
Use cases include collections of log files, exploring large dump files, working with flat email box files or importing data from now defunct software.
There are multiple scenarios for the input, ranging from one large input file to a collection of files in a semi-structured directory tree, all with the same style of content. The generated output would then be a choice of plain text files with lists of snippets, CSV-tables for import into spreadsheets, or newly generated text files with pasted sections from the input files.
While I have spoken of plain text files so far, a successful tool could be enhanced to work with text processor file formats, such as MS Word. There is a lot of important content in such semi-structured text documents which needs extracting for future use.
It would be important for such an application to work on Windows, Linux and macOS equally well. So using a cross-platform framework and language would be necessary.