3 pointsby captures8 hours ago1 comment
  • captures8 hours ago
    The classification is surprisingly simple - k-nearest neighbors on a 27-dimensional feature vector extracted from each drawing.

    The features: - Stroke count - Point density across 6 horizontal and 6 vertical bands (where is the ink?) - Direction histogram across 8 compass directions (which way are strokes going?) - Aspect ratio and total stroke length - First stroke start position, last stroke end position

    The training set is ~64k hand-drawn samples from the original Detexify project. Each sample gets preprocessed and converted to this 27D vector. Classification is then just finding the k nearest training samples by Euclidean distance and returning the most common symbols among them.