Skip to content

Commit 2e1e4d9

Browse files
authored
feat: add remarklint for md docs (#213)
* feat: add remarklint for md docs * fix: remarkrc file and run linter on commit hook
1 parent 0aa67ee commit 2e1e4d9

File tree

4 files changed

+685
-36
lines changed

4 files changed

+685
-36
lines changed

.remarkrc

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"plugins": [
3+
"remark-preset-lint-recommended",
4+
["remark-lint-list-item-indent", false]
5+
]
6+
}

package.json

+5-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"main": "./dist/mercury.js",
77
"scripts": {
88
"lint": "if test \"$CI\" != \"true\" ; then eslint . --fix; fi",
9-
"lint:ci": "eslint .",
9+
"lint:ci": "remark . && eslint .",
1010
"lint-fix-quiet": "eslint --fix --quiet",
1111
"build": "yarn lint && rollup -c && yarn test:build",
1212
"build:web": "yarn lint && rollup -c rollup.config.web.js && yarn test:build:web",
@@ -79,6 +79,9 @@
7979
"nock": "^10.0.6",
8080
"ora": "^3.0.0",
8181
"prettier": "^1.15.3",
82+
"remark-cli": "^6.0.1",
83+
"remark-lint": "^6.0.4",
84+
"remark-preset-lint-recommended": "^3.0.2",
8285
"requirejs": "^2.3.6",
8386
"rollup": "^1.1.0",
8487
"rollup-plugin-babel": "^4.0.1",
@@ -125,6 +128,7 @@
125128
"git add"
126129
],
127130
"*.{json,css,md}": [
131+
"remark .",
128132
"prettier --write",
129133
"git add"
130134
]

src/extractors/custom/README.md

+32-29
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,19 @@ Custom parsers allow you to write CSS selectors that will find the content you'r
88

99
You can query for every field returned by the Mercury Parser:
1010

11-
- title
12-
- author
13-
- content
14-
- date_published
15-
- lead_image_url
16-
- dek
17-
- next_page_url
18-
- excerpt
11+
- title
12+
- author
13+
- content
14+
- date_published
15+
- lead_image_url
16+
- dek
17+
- next_page_url
18+
- excerpt
1919

2020
### Using selectors
2121

2222
#### Basic selectors
23+
2324
To demonstrate, let's start with something simple: Your selector for the page's title might look something like this:
2425

2526
```javascript
@@ -41,12 +42,13 @@ As you might guess, the selectors key provides an array of selectors that Mercur
4142
The selector you choose should return one element. If more than one element is returned by your selector, it will fail (and Mercury will fall back to its generic extractor).
4243
4344
#### Selecting an attribute
44-
Sometimes the information you want to return lives in an element's attribute rather than its text — e.g., sometimes a more exact ISO-formatted date/time will be stored in an attribute of an element.
45+
46+
Sometimes the information you want to return lives in an element's attribute rather than its text — e.g., sometimes a more exact ISO-formatted date/time will be stored in an attribute of an element.
4547
4648
So your element looks like this:
4749
4850
```html
49-
<time class="article-timestamp" datetime="2016-09-02T07:30:01-04:00">
51+
<time class="article-timestamp" datetime="2016-09-02T07:30:01-04:00"></time>
5052
```
5153
5254
The text you want isn't the text inside a matching element, but rather, inside the datetime attribute. To write a selector that returns an attribute, you provide your custom parser with a two-element array. The first element is your selector; the second element is the attribute you'd like to return.
@@ -71,7 +73,7 @@ This is all you'll need to know to handle most of the fields Mercury parses (tit
7173
7274
An article's content can be more complex than the other fields, meaning you sometimes need to do more than just provide the selector(s) in order to return clean content.
7375
74-
For example, sometimes an article's content will contain related content that doesn't translate or render well when you just want to see the article's content. The clean key allows you to provide an array of selectors identifying elements that should be removed from the content.
76+
For example, sometimes an article's content will contain related content that doesn't translate or render well when you just want to see the article's content. The clean key allows you to provide an array of selectors identifying elements that should be removed from the content.
7577
7678
Here's an example:
7779
@@ -195,21 +197,21 @@ If you look at your parser's test file, you'll see a few instructions to guide y
195197
By default, the first test, which ensures your custom extractor is being selected properly, should be passing. The first failing test checks to see whether your extractor returns the correct title:
196198
197199
```javascript
198-
it('returns the title', (async) () => {
199-
// To pass this test, fill out the title selector
200-
// in ./src/extractors/custom/www.newyorker.com/index.js.
201-
const html =
202-
fs.readFileSync('./fixtures/www.newyorker.com/1475245895852.html');
203-
const articleUrl =
204-
'http://www.newyorker.com/tech/elements/hacking-cryptography-and-the-countdown-to-quantum-computing';
205-
206-
const { title } =
207-
await Mercury.parse(articleUrl, html, { fallback: false });
208-
209-
// Update these values with the expected values from
210-
// the article.
211-
assert.equal(title, 'Schrödinger’s Hack');
212-
});
200+
it('returns the title', async () => {
201+
// To pass this test, fill out the title selector
202+
// in ./src/extractors/custom/www.newyorker.com/index.js.
203+
const html = fs.readFileSync(
204+
'./fixtures/www.newyorker.com/1475245895852.html'
205+
);
206+
const articleUrl =
207+
'http://www.newyorker.com/tech/elements/hacking-cryptography-and-the-countdown-to-quantum-computing';
208+
209+
const { title } = await Mercury.parse(articleUrl, html, { fallback: false });
210+
211+
// Update these values with the expected values from
212+
// the article.
213+
assert.equal(title, 'Schrödinger’s Hack');
214+
});
213215
```
214216
215217
As you can see, to pass this test, we need to fill out our title selector. In order to do this, you need to know what your selector is. To do this, open the html fixture the generator downloaded for you in the [`fixtures`](/fixtures) directory. In our example, that file is `fixtures/www.newyorker.com/1475248565793.html`. Now open that file in your web browser.
@@ -223,7 +225,7 @@ So, back to the title: We want to make sure our test finds the same title we see
223225
The selector for this title appears to be `h1.title`. To verify that we're right, click on the Console tab in Chrome's Developer Tools and run the following check:
224226
225227
```javascript
226-
$$('h1.title')
228+
$$('h1.title');
227229
```
228230
229231
If that returns only one match (i.e., an array with just one element), and the text of that element looks like the title we want, you're good to go!
@@ -247,7 +249,8 @@ export const NewYorkerExtractor = {
247249
Save the file, and... uh oh, our example still fails.
248250
249251
```javascript
250-
AssertionError: 'Hacking, Cryptography, and the Countdown to Quantum Computing' == 'Schrödinger’s Hack'
252+
AssertionError: 'Hacking, Cryptography, and the Countdown to Quantum Computing' ==
253+
'Schrödinger’s Hack';
251254
```
252255
253256
When Mercury generated our test, it took a guess at the page's title, and in this case, it got it wrong. So update the test with thte title we expect, save it, and your test should pass!
@@ -259,7 +262,7 @@ We've been moving at a slow pace, but as you can see, once you understand the ba
259262
For a slightly more complex example, you'll find after a bit of looking that the best place to get the most accurate datetime on the page is in the head of the document, in the value attribute of a meta tag:
260263
261264
```html
262-
<meta value="2016-09-26T14:04:22-04:00" name="article:published_time">
265+
<meta value="2016-09-26T14:04:22-04:00" name="article:published_time" />
263266
```
264267
265268
As [explained above](#selecting-an-attribute), to return an attribute rather than the text inside an element, your selector should be an array where the first element is the element selector and the second element is the attribute you want to return. So, in this example, the date_published selector should look like this:

0 commit comments

Comments
 (0)