Open-Source Python Packages
Used internally, to quickly get started building jobs to submit to internal job submission system called
Used internally, to quickly visualize metrics tracked as part of one or more jobs submitted to
Ludwig. For example, compare learning curves (e.g. MSE) for groups of neural networks each trained with different hyper-parameters.
This repository contains convenienve functions for preparing text data for training NLP models. It is possible to prepare batches of data that preserve the order of documents in the corpus.
This repository contains convenience functions for common visualizations, built on top of
Natural Language Data
To create a custom CHILDES corpus, use the Python package
The package allows you to define custom tokenization rules, among other features. The output is a text file containing line-separated transcripts, and will be ordered by the age of the child spoken to.
To create a custom Wikipedia corpus, use the Python package
CreateWikiCorpus. Due to the size of English Wikipedia - 4billion words as of 2019 - corpus creation is distributed across
Ludwig workers, and output files are saved to the server. A single corpus is distributed across multiple text files, each created on a separate worker.