Michael W. Sherman

Thank you for visiting my website.

I’m an artificial intelligence practitioner and applied researcher, with expertise in general machine learning and AI on language (NLP/NLU).

I currently work as a Machine Learning Engineer with Google Cloud Professional Services, where I prototype and build AI systems for Google Cloud customers.

I am not accepting solicitations right now for private consulting work, but I can work with you if you are using Google Cloud.

Find Me
Work

Selected Professional Projects

Bloomberg Points of Law

I led data science and machine learning on a new Bloomberg product called Points of Law. I was the only member of the development team with prior machine learning experince.

Points of Law is a machine-learning driven product that mines 13 million court documents for key legal concepts, then provides tools for lawyers to find the legal concepts most relevant to their case. It was awarded the 2018 New Product of the Year by the American Association of Law Librarians.

Points of Law finds legal concepts in a court document and links to other court documents discussing the same legal concept:

Points of Law Document View

Points of Law builds a timeline of every legal concept, tracing the evolution of the concept and revealing the court case where the concept was first used:

Points of Law Graph View

Far more lawyer time is spent researching past cases to find relevant points of law than in the actual courtroom. This product makes that research tens to hundreds of times faster.

Coverage:


Monitoring the U.S. Stock Market

As a Consultant at FIS (then Sungard), I was part of a team working on a bid to build the Consolidated Audit Trail (CAT), a petabyte-scale data warehouse tracking all activity in the American stock market–every order, every cancellation, every trade, every quote, etc.

As part of our bid, we built a prototype that processed 6 billion transactions per hour and published a whitepaper discussing our scalability tests. In addition to driving the bidding process forward and designing the schema of the proposed data warehouse, I wrote the bulk of the whitepaper and ran the day-to-day engineering efforts of our Hadoop+Bigtable (Hbase) prototype.

We were awarded a U.S. patent for our Consolidated Audit Trail prototype.

Coverage:

  • PCWorld interviews my team lead on the successful prototype.
  • CNBC on the intense requirements of the Consolidated Audit Trail.
  • Bloomberg on the whitepaper and prototype.
  • Press Release from FIS.


Estimating the Cash Value of Financial Assets

I led a team of seven developers prototyping a machine learning system to predict the cash value of financial assets held by a large bank’s customers. Our system used the bank’s proprietary data in addition to publicly-available data on the financial assets. We showed how the bank could use its data to estimate the cash value of financial assets more accurately than with only publicly-available data.


Automated Data Entry

I was part of a team that built a series of machine learning models to automate data entry of new products for a wholesaler. The data entry was for a highly regulated product caterogy, where human data entry clerks had to gather multiple legally-required pieces of paperwork, search the paperwork for key pieces of information, then enter that information into a database. We built a series of models that determined if the correct paperwork was available then extracted the key pieces of information from the paperwork.


Energy Markets Dashboards

As a Consultant at FIS (then Sungard), I worked with the CEO of an energy investment fund to build dashboards monitoring the daily performance of energy companies owned by the fund. Working with a team of energy analysts, I created a series of Tableau dashboards built on custom metrics. I also advised on changes to the underlying SQL data warehouse.

Dashboard Images


Better Workers’ Compensation Outcomes

I was part of a team of student consultants tasked by a large American retailer to analyze their workers’ compensation data. Our goal was to determine best practices for getting employees back to full health.

We found a handful of decision points in the retailer’s workers’ compensation process where better decisions would produce better outcomes. Some of these decisions were already part of standard practices, but some were not. One of our discovered optimal decisions even contradicted standard industry practices. Additionally, we identified segments of claimants where different decisions led to different outcomes.

Speaking

Speaking and Presentations

Practical Machine Learning for Financial Services

Keynote Speaker at UiPath Together NY, 2018


Productionization of Machine Learning

Panelist at Texas Analytics Summit 2018

Side Projects

Open Source Contributions

Gensim

Gensim is a popular python library for running NLP algorithms on large corpora. I added a method to automate training word2vec on muiltple files, and added support for user-defined colocation detection metrics. Pull Requests.

Other Projects

Erowid Web Forum Analysis

Erowid is a website that’s collected millions of first-hand accounts of drug abuse. I did some text analytics. School Project. Writeup.

Drug Co-Abuse


Diabetes Prediction

Prediction of diabetes from other elements of a patient’s medical record. School Project. Writeup.

Papers & More

Papers and Publications

Civil Asset Forfeiture: A Judicial Perspective

L. Barret et al., “Civil Asset Forfeiture: A Judicial Perspective,” in Proceedings of the Data For Good Exchange 2017, 24 September 2017, New York, New York, USA [Online]. Available: https://arxiv.org/abs/1710.02041.

Data mining Bloomberg Law’s collection of court dockets to learn more about a law enforcement practice of seizing property without a guilty verdict.


Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable

N. Palmer, M. Sherman, Y. Wang, and S. Just, “Scaling to Build the Consolidated Audit Trail: A Financial Services Application of Google Cloud Bigtable,” Google, 2015. [Online]. Available: https://cloud.google.com/bigtable/pdf/FISConsolidatedAuditTrail.pdf.

Testing write scalability of Google Cloud Bigtable on stock market transaction data.


An Expected Value Model for Maximizing Revenues from Sales Tax Audits

M. Sherman, N. White, and Y. Du, “An Expected Value Model for Maximizing Revenues from Sales Tax Audits,” presented at Texas Workshop on Social and Business Analytics, 28 March 2014, Austin, Texas, USA [Online]. Available: https://www.michaelwsherman.com/web_img/audit_poster.png.

A custom, interpretible model to maximize the revenue of tax audits. Poster.

Patents

N. Palmer and M. Sherman, “System and Associated Methodology of Creating Order Lifecycles via Daisy Chain Linkage,” U.S. Patent 10,089,687, 2 Oct., 2018.

Awards

2018 New Product of the Year by the American Associations of Law Librarians for Bloomberg Points of Law

I led data science and machine learning, working with a team of engineers who had never done machine learning before.

2016 Bloomberg Verticals TechFest Winner (Internal Award)

Awarded for “applying deep learning to NLP tasks”. My team did the first deep learning work in our division.

2015 Sungard Consulting STAR Award (Internal Award)

Awarded for “exceeding expectations in sales, leadership, or deliverables”, awarded to 4 consultants annually (of about 200). I led the day-to-day work on a prototype system to process 6 billion stock market transactions per hour and wrote the bulk of the companion whitepaper.