Linguistics study harnesses OSC resources

Former Communications Manager
Thursday, November 30, 2017 - 10:45am

The research being done through the OH-TECH member organizations — OhioLINK, the Ohio Supercomputer Center, and OARnet — blows my mind. From cancer research to cleaning up the environment to learning more about mental illnesses and societal issues at large, it’s incredible to work at a place that has such a large hand in the research going on throughout Ohio and the United States.

Recently, in a story I wrote for OSC’s annual research report, there were a couple aspects of a project I found particularly interesting. 

William Schuler, Ph.D.

William Schuler, Ph.D., a linguistics professor at The Ohio State University, and his doctoral student, Lifeng Jin, are contributing to a project that could have major implications in disaster relief efforts. They are also proving a computer is capable of learning a language with no supervision.

If you want all the particulars of the project, check out our case study. But one fantastic aspect of the interview was Jin’s testimonial about his use of OSC services. Jin is such a big proponent of OSC and high performance computing that he put on a workshop for the entire OSU linguistics department about a year ago.

For this particular project, Jin shared some wonderful details about how he uses OSC and how it allows him to scale up his work. On a powerful single server, Jin can analyze between 10 and 15 categories of grammar. But using the graphics processing units (GPUs) on OSC’s Owens Cluster allows Jin to increase the number of categories to 45 or 50 and get results in a shorter amount of time.

“It’s a more realistic scenario of imitating what humans are doing. The models are really big, so memory is crucial,” he said. “The statistical model is also very complicated. In order to train it, we have to do a lot of computation. Say we have 20,000 sentences from a given language, we use that to train the grammar. That’s where OSC comes in. In the first stage, we tried to train the grammar using CPUs, but they’re too slow. So, we refactored our code to use GPUs for sampling, and it’s sped up our process greatly.”

Lifeng Jin

CPUs – central processing units – are the brains of a computer and are composed of just a few cores with lots of cache memory. GPUs are complementary processing units to CPUs and are composed of hundreds of cores that can handle thousands of threads simultaneously. GPUs have the capability to quickly execute computations important in engineering analysis and simulation. The Owens Cluster has 160 Nvidia P100 GPUs.

Supercomputing speed is so important to Schuler and Jin because disaster relief needs the quickest attention possible. In August, DARPA organized a trial run to simulate two real disasters in Africa. Schuler’s group used 60 GPUs on the Owens Cluster for seven days for four grammars of two languages, illustrating the importance of OSC’s powerful resources to the project.

Jin said as they begin using more realistic configurations for a given language’s grammars, the size of the grammars and the computation required to explore them will be even greater, giving OSC an even greater future role as the research evolves.

 “The GPUs are really good,” Jin said. “The models are really big, so the memory is crucial. Owens now has better GPUs and more memory and faster speed, so we can finish our process in an even shorter amount of time. The batch processing system is friendly to our program, so we can break down the whole computation into smaller batches and send them to different nodes to do computation, so that’s also very efficient.”

Jin also discussed another major feature at OSC, OnDemand. OnDemand is OSC’s one-stop shop for access to its high performance computing resources. With OnDemand, clients can upload and download files; create, edit, submit, and monitor jobs; run GUI applications; and connect via SSH, all via a web browser, with no client software to install and configure.

“OnDemand is really, really good. I love it,” he said. “It’s very convenient you have a web interface to see everything, and you can get access to it to OSC computers, desktops … It’s very convenient.

“We have a cluster in the school of arts and sciences, Unity, and the main purpose of that cluster is to have the same working environment as OSC, so you can debug on Unity and transport everything to OSC seamlessly without any change at all and run it on OSC. That helped a lot in terms of using OSC.”