Thursday, May 2, 2013

Quick DLP Scans With ClamAV


Did you know that ClamAV has a DLP module that can scan for credit cards or social security numbers contained in files? One reason that it is interesting is that ClamAV is found on almost all linux security distros (including RŌNIN) and is easily launched from the command line.  If you've ever worked breach cases in data environments covered under PCI-DSS or HIPAA, you know that one of the first questions to answer is: Did personally identifiable information (PII) exist on the compromised system?  To that endhaving a quick and readily available DLP scanning tool is a useful capability.

Running DLP Scan Using ClamScan
You can run a DLP (and AV sweep) using the ClamAv command line scanner, clamscan,  and following options:

clamscan -r --detect-structured=yes --structured-ssn-format=2 --structured-ssn-count=5 --structured-cc-count=5 directorypath
Command breakdown
-r  (recursive file scanning)
--detect-structured (yes turns on DLP matching. no by default)
--structured-ssn-format=2  (this tells scanner to match both ###-##-#### and #########).
--structured--ssn-count  (number of ssn matches/hits to exceed before reporting)
--structured-cc-count (number of ccn matches/hits to exceed before reporting)

Testing ClamAV DLP Module
To test ClamAV's DLP module, you can use a great DLP test data-set provided by IdentityFinder. This data-set is comprised of a number of files that contain fake ssns, ccns, and other elements of PII distributed across a wide range of common file formats.

If we fire off a scan of this data set using clamscan we get the following results:

clamscan -r --detect-structured=yes --structured-ssn-format=2 --structured-ssn-count=1 --structured-cc-count=1 ./Identity_Finder_Test_Data
./Identity_Finder_Test_Data/Employee Database.accdb: OK
./Identity_Finder_Test_Data/Hidden Column.xls: OK
./Identity_Finder_Test_Data/Department.csv: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/college essay w footer.doc: OK
./Identity_Finder_Test_Data/Fake SSNs/fake_ssn.txt: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/Contacts.pptx: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/loans.xlsx: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/Samples/SSN.txt: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/Samples/Sample Real CCN.txt: Heuristics.Structured.CreditCardNumber FOUND
./Identity_Finder_Test_Data/2009 class.docx: Heuristics.Structured.SSN FOUND
./Identity_Finder_Test_Data/Tax Return 2008.pdf: Heuristics.Structured.CreditCardNumber FOUND
./Identity_Finder_Test_Data/Credit Report.pdf: Heuristics.Structured.CreditCardNumber FOUND
./Identity_Finder_Test_Data/Employee Database.mdb: OK
./Identity_Finder_Test_Data/request.zip: Heuristics.Structured.CreditCardNumber FOUND
./Identity_Finder_Test_Data/application.pdf: Heuristics.Structured.CreditCardNumber FOUND
./Identity_Finder_Test_Data/students.ppt: Heuristics.Structured.SSN FOUND

From the output we can see that ClamAV found PII in a large number but not all of these files (which we should have with low count levels). In particular, the DLP module seems to have a hard time identifying PII contained in access database files, excel docs with hidden columns, and word document footers. As ClamAV's DLP functionality is based on parsing binary streams for matches on structured data (regex), it seems to have issues with formats that do not employ straight-forward textual encoding.

For a comprehensive DLP sweep, we'd want to look to a  tool like OpenDLP or commercial tools like Identity Finder. However for a quick initial review, ClamAV's DLP scanning features are very good for performing cursory assessments.

No comments:

Post a Comment