Michael W. Sherman

Thank you for visiting my website.

I’m an artificial intelligence practitioner and applied researcher with expertise in general machine learning and AI on text (NLP/NLU).

I currently work as a Machine Learning Engineer with Google Cloud Professional Services, where I prototype and build AI systems for Google Cloud customers.

Find Me
Work

Selected Professional Projects

Bloomberg Points of Law

I led machine learning and NLP work on a new Bloomberg product called Points of Law. I was the only member of the development team with prior machine learning experince.

Points of Law is a legal search product that mines 13 million court documents for key legal concepts and provides tools for lawyers to find the legal concepts most relevant to their cases. It was awarded the 2018 New Product Award by the American Association of Law Librarians.

Points of Law finds legal concepts in court documents and links to other court documents discussing the same legal concept:

Points of Law Document View

Points of Law builds a timeline of every legal concept, tracing the evolution of the concept and revealing the court case where the concept was first used:

Points of Law Graph View

Far more lawyer time is spent researching past cases to find relevant points of law than in the actual courtroom. This product makes that research tens to hundreds of times faster.

Coverage:


Monitoring the U.S. Stock Market

As a Consultant at FIS (then Sungard), I was part of a bid team that architected and prototyped a system to build the Consolidated Audit Trail (CAT), a petabyte-scale data warehouse to track all activity in the American stock market–all orders, cancellations, trades, quotes, etc.

As part of our bid, we built a prototype that processed 6 billion transactions per hour and published a whitepaper discussing our scalability tests. In addition to driving the bidding process forward and designing the schema of the proposed data warehouse, I wrote the bulk of the whitepaper and ran the day-to-day engineering efforts of our Hadoop+Bigtable (Hbase) prototype.

We were awarded a U.S. patent for our Consolidated Audit Trail prototype.

Coverage:

  • PCWorld interviews my team lead on the successful prototype.
  • CNBC on the intense requirements of the Consolidated Audit Trail.
  • Bloomberg on the whitepaper and prototype.
  • Press Release from FIS.


Estimating the Cash Value of Financial Assets

I led a team of seven developers prototyping a machine learning system to predict the cash value of financial assets held by customers at a large bank. Our system used the bank’s proprietary data in addition to publicly-available data on the financial assets. We showed how the bank could use its data to more accurately estimate the cash value of financial assets versus using only publicly-available data.


Automated Data Entry

I was part of a team that built a series of machine learning models to automate new product data entry for a wholesaler. The data entry was for a highly regulated product category, where data entry clerks had to gather multiple legally-required pieces of paperwork, search the paperwork for key pieces of information, and enter that information into a database. We built a series of models that determined if the correct paperwork was available then extracted the key pieces of information from the paperwork.


Energy Markets Dashboards

As a Consultant at FIS (then Sungard), I worked with the CEO of an energy investment fund to build dashboards monitoring the daily performance of the fund’s energy companies. Working with a team of energy analysts, I created a series of Tableau dashboards built on custom metrics. I also advised on changes to the underlying SQL data warehouse.

Dashboard Images


Better Workers’ Compensation Outcomes

I was part of a team of student consultants tasked by a large American retailer to mine their workers’ compensation data to discover medical treatment choices that led to faster recovery times.

We found a handful of decision points in the retailer’s workers’ compensation process where additional treatments sped up recovery. We also discovered new segments of claimants where identical treatments led to different outcomes.

Speaking

Speaking and Presentations

Practical Machine Learning for Financial Services

Keynote Speaker at UiPath Together NY, 2018


Productionization of Machine Learning

Panelist at Texas Analytics Summit 2018

Side Projects

Open Source Contributions

Gensim

Gensim is a popular python library for running NLP algorithms on large corpora. I added a method to automate training word2vec on muiltple files, and added support for user-defined colocation detection metrics. Pull Requests.

Other Projects

Erowid Web Forum Analysis

Erowid is a website that’s collected millions of first-hand accounts of drug abuse. I completed some text analytics. This was a graduate school project. Writeup.

Drug Co-Abuse


Diabetes Prediction

Prediction of diabetes from other elements of a patient’s medical record. This was a graduate school project. Writeup.

Papers & More

Papers and Publications

Civil Asset Forfeiture: A Judicial Perspective

L. Barret et al., “Civil Asset Forfeiture: A Judicial Perspective,” in Proceedings of the Data For Good Exchange 2017, 24 September 2017, New York, New York, USA [Online]. Available: https://arxiv.org/abs/1710.02041.

We data mined Bloomberg Law’s collection of court dockets to learn more about a law enforcement practice of seizing property without a guilty verdict.


Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable

N. Palmer, M. Sherman, Y. Wang, and S. Just, “Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable,” Google, 2015. [Online]. Available: https://cloud.google.com/bigtable/pdf/FISConsolidatedAuditTrail.pdf.

We tested write scalability of Google Cloud Bigtable on stock market transaction data.


An Expected Value Model for Maximizing Revenues from Sales Tax Audits

M. Sherman, N. White, and Y. Du, “An Expected Value Model for Maximizing Revenues from Sales Tax Audits,” presented at Texas Workshop on Social and Business Analytics, 28 March 2014, Austin, Texas, USA [Online]. Available: https://www.michaelwsherman.com/web_img/audit_poster.png.

We built a custom, interpretible model to maximize the revenue of tax audits. Poster.

Patents

N. Palmer and M. Sherman, “System and Associated Methodology of Creating Order Lifecycles via Daisy Chain Linkage,” U.S. Patent 10,089,687, 2 Oct., 2018.

Awards

2018 New Product of the Year by the American Associations of Law Librarians for Bloomberg Points of Law

I led machine learning and NLP, and was the only development team member with machine learning experience.

2016 Bloomberg Verticals TechFest Winner (Internal Award)

Awarded for “applying deep learning to NLP tasks.” My team trained the first deep learning models in our working group.

2015 Sungard Consulting STAR Award (Internal Award)

Awarded for “exceeding expectations in sales, leadership, or deliverables,” awarded to 4 consultants annually (of about 200). I led the day-to-day work on a prototype system to process 6 billion stock market transactions per hour and wrote the bulk of the companion whitepaper.