They're hard.

At least, that was the rep on new tests aligned to the Common Core State Standards that millions of U.S. kids took last spring. Now you can be the judge.

There are now a slew of actual math and English Language Arts questions online — searchable — from the Partnership for Assessment of Readiness for College and Careers — better know as PARCC. You can also see some student responses and guidance on how they were scored.

Amid all the political controversy over the Common Core and whether students should even take these exams, this gives us a chance to look objectively at the tests themselves.

In this post, we picked a handful of those questions that jumped out at us (and likely would have jumped out at you, too). We ran them by a few experts who played no official role in developing them.

Let's start with math, focusing on two questions from the third-grade assessments. Here's the first:

A few things worth noting about this problem. It's meant to measure skills outlined in the Core standards, including this one:

CCSS.MATH.CONTENT.3.OA.A.4

"*Determine the unknown whole number in a multiplication or division equation relating three whole numbers. For example, determine the unknown number that makes the equation true in each of the equations 8 × ? = 48, 5 = _ ÷ 3, 6 × 6 = ?"*

You'll also notice, the question begins not with a traditional equation in search of resolution but with Fred's incorrect response. The test-taker is then asked why that response is incorrect.

"It's one thing to produce a correct answer yourself, it's another thing to analyze someone else's response and explain why it's correct or incorrect," says Diane Briars, president of the National Council of Teachers of Mathematics. Briars believes the PARCC tests are the next step in the decades-long pursuit of a test that can more accurately measure conceptual understanding.

This question's three parts require the test-taker to (in parts A & B) demonstrate facility with division to prove the test-taker's answer is incorrect and (part C) convert the problem from division to multiplication. Briars says understanding the relationship between division and multiplication — and using one to explore the other — are key mathematical skills.

While this may be an improvement on the traditional multiple-choice exam, Hugh Burkhardt still finds the question lacking. Burkhardt is a widely respected expert on testing and mathematics instruction at the University of Nottingham in the U.K., and he helps lead the MARS Shell Center team (which receives funding from the Bill & Melinda Gates Foundation, as does NPR Ed).

Burkhardt says that a well-built assessment should require students to engage in long chains of reasoning, with some questions demanding multiple steps and building one atop the other.

The PARCC question above, Burkhardt worries, doesn't do that. The answer to Part A, he says, should be obvious to most third-graders who have learned their multiplication tables. And the difference between Parts A and B isn't obvious. In fact, the answer to B could be used to adequately answer A.

Here's how one student answered the questions:

And here is PARCC's guidance on how the student's responses were scored. Notice, the test-taker received full credit for Part A, though the explanation included not a word of text.

While we're on the subject of text, Briars and Burkhardt both noted the language of this question. Briars says a lot of thought goes into making sure the words and context are age-appropriate. PARCC doesn't want a third-grader who understands the math to get hung up on a sentence she can't read or understand easily. In this case, the subject is certainly accessible — stuffed animals — though Burkhardt believes the language could be even simpler.

Burkhardt raises one more general concern with computer-based questions like this. The box, he says, where the test-taker is meant to provide an answer, is surrounded by small buttons (not visible in the image above) with symbols that may seem strange to kids. That worries him.

"You have to give children a medium that is natural for their normal mode of thinking," Burkhardt explains. "They have to integrate this software with their thinking on math." If the software distracts or intimidates the child, then it's counterproductive, he says.

Let's look at one more math question now:

As with the question about Fred and his stuffed animals, the subject matter here is certainly accessible: buttons. Another similarity: It presents test-takers with another incorrect answer. This one's a bit more complex, though, because Jeanie's faulty reasoning is more complex.

The question depends, in part, on a third-grader's ability to do this:

CCSS.MATH.CONTENT.3.NBT.A.2

"*Fluently add and subtract within 1000 using strategies and algorithms based on place value, properties of operations, and/or the relationship between addition and subtraction.*"

The student has to do some mathematical forensics, tracing Jeanie's faulty reasoning backwards, seeing that (part A) when she added up the 18 ones, she neglected to carry the 10 and (part B) jumbled the two-digit numbers she was supposed to be subtracting. This student got it:

And here's how PARCC scored those responses:

Burkhardt was more impressed with this question, saying "the ability to detect and correct (your own) misconceptions is critical to being able to do math." And the mistake Jeanie makes is a common one for her age.

Diane Briars says open-ended, multi-faceted questions like these aren't new, but their use in annual state tests declined with passage of the federal No Child Left Behind law. That's because the law required that most students be tested annually.

The problem, Briars says, is that the richer the answer a question elicits, the more expensive it is to grade. And, with NCLB's massive expansion of testing, some states balked at the cost.

So, what about reading and writing? It's tough to wrap your head around the nuances that surround literacy. Reading and writing skills fall on a spectrum. They can be tricky to measure.

With that in mind, let's look at PARCC's English Language Arts and Literacy test, third grade again.

David Pearson (no relation to the giant testing and education company) is a professor of language, literacy and culture at the University of California, Berkeley, and he says the reading exam doesn't actually look all that different from old tests. But, he adds, with two exceptions.

One is the prominence of "technology-enhanced" questions, where students actually go click on their answer in the text or "drag and drop" their answer into a box. The other is the increased use of "paired" questions, like this one:

PARCC has highlighted these types of questions as a way to get students to think more deeply. The above question, for example, isn't just asking you what the main idea is, it's asking how the poem shows you that idea and then it asks you to prove it.

But Pearson says that a two-part question means one answer relies on the other. If a kid doesn't get Part A, then Part B won't make sense. That, in turn, might nudge the student to go back and revise their initial response. Once that happens, Pearson says, it's tough to know what the student did — and didn't — understand.

"It muddles the inferences you can draw from the student's answer," he says. "It compromises precision in the name of complexity."

Pearson was an advisor for the other test consortium, Smarter Balanced, but he says he hasn't warmed up to these types of questions on that test either. Speaking of feeling cold, let's move on to the Arctic, or at least, an essay prompt about it:

To do well on this one, students needed to do a lot. Pearson says this type of question, which tests for both reading and writing, is not a new concept. But it doesn't often show up on state tests. He says it should.

"Reading and writing are inextricably bound," Pearson explains. "Reporting on what you read is a really good way of promoting reading comprehension. It gives you a lens to read the text."

This student nailed it:

Below is how PARCC graded that same student's essay, with a separate section for both reading and writing skills.

Regardless of the problems and assets that any test has, Pearson says that it's the stakes of standardized tests that really shape the culture surrounding them.

"When stakes are high and people's jobs and schools are on the line, people engage in desperate behaviors," he says.

But the quality of the test does matter, he adds: "If you're going to teach to the test, you may as well have a test worth teaching to."