The first step to legacy modernization is gaining an understanding of exactly what is there to begin with. In many cases this is not as easy as it sounds.
Many mainframe shops have developed practices over several years that, quite simply, work. Unfortunately, in some cases they work only because of one or two subject matter experts (SMEs) who know what everything does, and how everything hangs together.
Code version control tools can help, but the code being controlled is only as good as what is put into the tool in the first place. Sometimes, there can be code, often for obscure but vital pieces of processing, in non controlled libraries, both in terms of the source and the executable. Even harder to find are those elements of functionality that have been hidden away in system or database EXITs. These are usually in the domain of Systems Programmers or other SMEs and are beyond the remit of the code control or impact analysis tool usage of many organisations.
The simple fact is that an organisation has to start somewhere. Code should be gathered from all known libraries, using current tooling as well as input from SMEs, and then an impact analysis tool should be used to identify the gaps. Most tools of this nature will give an overall picture of the software estate, including where code is missing or perhaps not executed.
Once again the golden rule is that the tool is only as good as its inputs. Link decks, JCL, sort parameters, database schemas, copybooks and any other relevant artefacts should all be considered for inclusion, and then there is the issue of currency.
Is the code being executed actually the same as the code in the repository? With recently changed modules this is likely to be the case, but for older code, that has perhaps not been amended for more than a decade, there is a real possibility that it may have been thrown into the code version control tool without consideration for where the real source is, or it may not be in there at all. Another possibility is that it could be executed from a test or other non controlled library.
Copybooks are another problematic area, as once a program is compiled there is often no direct connection between the movement of the source into production with the movement of the copybook. This is frequently left to the individual developer to manage, which can lead to copybooks not being updated into the production environment. This will not immediately show itself, however it will be an issue when the next developer needs the same copybook or program.
JCL causes issues because it is not always treated the same as source code. Often, JCL is not considered suitable for inclusion within a code version control tool, and is developed on an ad hoc basis by any number of operations analysts. This can lead to inconsistency, with a proliferation of libraries for parameters, procs, inputs, outputs etc, making impact analysis particularly problematic.
That first run through an impact analysis tool can be a real eye opener, and it should lead to a push to remove all of the redundant assets readily identified. Engaging the services of a local SME to assist will also enable other redundant artefacts to be removed, albeit with maybe a few code changes along the way, but the immediate gain is a better understanding of what is actually there, and a reduced estate for an overall modernization project.