Using intelligence in automated open source management
Posted by Lacey Thoms on Thu, May 05, 2011 @ 10:03 AM
As mentioned earlier, in our post on automated open source management, the upside of a large database is that you can find a match to your software. Deep scanning software by matching code structure to a large reference database of all publicly available code is the most thorough way to automate intellectual property and open source license and copyright management.
By definition, this database must be very large. Protecode’s Global IP Signatures (GIPS) database contains more than 70 million software files, and an equivalent of about 60 billion lines of code.
The downside of a large database is that, unless you throw some intelligence at it, the results become useless. Firstly, without intelligent searching techniques, it may take a significant amount of time to detect all matches. Secondly, without intelligent sorting techniques, you find a lot of matches to your file.
To demonstrate the point, let’s take a common software file as an example. Assume that your code portfolio includes Apache Foundation’s commons logging package. This is a very popular package, has been released in binary and source form many times over the past decade. Moreover, Protecod’s GIPS database shows more than 4000 entries for the java binary file commons-logging.jar, in its various forms. This file is used by about 4000 other projects in the public domain, within and outside Apache Foundation. A good database search should detect all the instances accurately and rapidly. But, an unintelligent search would show 4000 matches which would then require weeks of manual research to determine which commons-logging project is actually used in your code portfolio. That’s NOT automation.
An intelligent database mining solution can sift through the database entries that match the target software file, and both at micro and macro level apply rules that would systematically eliminate non-relevant matches. These selection rules would apply attributes such as the signatures, dates, licenses, copyrights, project context, scan-run context and scan history, to name a few, in coming up with the most relevant match at the top.
Applying intelligence to open source software license management is all about accurate discovery. The objective is to provide accurate information to decision makers in an organization, so that decisions can be made on quality and governance risk reduction in timely manner and with minimum effort.
Want to see more? View a report generated by using intellegent automated code scanning.