Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Somehow make the spec searchable (e.g. by generating PDF version) #10218

Closed
scabug opened this issue Mar 4, 2017 · 41 comments
Closed

Somehow make the spec searchable (e.g. by generating PDF version) #10218

scabug opened this issue Mar 4, 2017 · 41 comments

Comments

@scabug
Copy link

scabug commented Mar 4, 2017

I can't imagine this is a new issue but I can't find an old one, so.

Please make the specification searchable (available as a single page as a quick fix?) or, better, figure out how to make a proper index. The language is complex enough that questions about its behavior come up a lot for users, and it's often quite hard to find the relevant section in the spec. For example, I had a question last night about import priority and found it in the introduction to identifiers, names and scopes in the 2.9 spec PDF and it happens to still be there (but not in the section on import statements, which I found via the TOC, which is where I looked first).

@scabug
Copy link
Author

scabug commented Mar 4, 2017

Imported From: https://issues.scala-lang.org/browse/SI-10218?orig=1
Reporter: Rob Norris (rnorris)

@scabug
Copy link
Author

scabug commented Mar 6, 2017

@SethTisue said (edited on Mar 6, 2017 5:01:40 PM UTC):
issue and preceding discussion (on PDF generation specifically, not indexing more broadly) at scala/docs.scala-lang#516

it would be wonderful if some volunteer tackled this.

@bjornregnell
Copy link

bjornregnell commented May 6, 2017

In an educational context, I think students learning Scala and also teachers designing Scala courses, in addition to a searchable document, would also greatly benefit from a pdf with the language spec readable off-line and printable from a paginated format.

@atiqsayyed
Copy link

Hi,
Is this still open? If yes, can i pick it up?

@lrytz
Copy link
Member

lrytz commented Jun 16, 2017

@atiqsayyed yes, you're more than welcome to work on this!

@som-snytt
Copy link

I volunteered on gitter today, to make sure there wouldn't be a permanent record.

@SethTisue SethTisue changed the title Somehow make the spec searchable Somehow make the spec searchable (e.g. by generating PDF version) Aug 4, 2017
@atiqsayyed
Copy link

@som-snytt sorry to have missed on this issue, can we discuss about it to make sure we understand what we have to do here?

@som-snytt
Copy link

I took a glance but won't have time until a three-day weekend that is not US Labor Day. Halloween is on a Tuesday this year.

@bjornregnell
Copy link

It's very good to be able to search but also nice to be able to print it and view it in a paginated form in a pdf-viewer, so for my Scala teaching efforts here at Lund University, a pdf version would be really valuable. It would be really cool if you both could join forces an achieve some progress on this issue, @som-snytt @atiqsayyed

@jvican
Copy link
Member

jvican commented Sep 3, 2017

I'm not sure, but I think there already exists a PDF version. At least I've used PDFs of previous Scala versions in the past.

One solution to this problem would be https://www.algolia.com/. It's free for open-source projects. It would be cool for the rest of the docs too, not only the spec.

But someone would need to step up to make it a reality. It wouldn't be difficult though, just:

  1. Make a request to get the search engine.
  2. Copy-paste some JS in the docs so that both the spec and the normal docs get different search boxes.

@bjornregnell
Copy link

I think a pdf version only exists for 2.11 which I think was written in latex, but now its markdown or something. A pdf-generation infrastructure for the language spec of 2.12 (and 2.13 and Dotty etc) would be really nice.

@SethTisue
Copy link
Member

was written in latex, but now its markdown

correct. the change happened several years ago, in 2014, between 2.10 and 2.11

@jvican
Copy link
Member

jvican commented Sep 28, 2017

@SethTisue I find myself needing this. How can we make such a thing happen?

@jvican
Copy link
Member

jvican commented Sep 28, 2017

Seems something like this https://www.sitepoint.com/creating-pdfs-from-markdown-with-pandoc-and-latex/ could work. Is there someone out there that would like to contribute such a thing?

@ritschwumm
Copy link

i'd probably render markdown to html with some JS library and feed the thing to electron-pdf, athenapdf or maybe chrome (headless). the latter works really well in my experience. where can i find the markdown sources? i might give it a try...

@SethTisue
Copy link
Member

@ritschwumm in the scala/scala repo under the spec directory

@SethTisue
Copy link
Member

SethTisue commented Sep 29, 2017

@jvican can’t think what to add besides what’s already in the comments here, or in the linked past discussion

@som-snytt
Copy link

@SethTisue Consider adding that next time folks update their will, they could include a small endowment or trust to ensure work on a ticket is funded. The resulting metric is the inverse bus factor, how many untimely deaths are required for features to progress.

@jvican
Copy link
Member

jvican commented Sep 29, 2017

@ritschwumm https://github.com/scala/scala/tree/2.12.x/spec.

Would be awesome if you give it a try.

@ritschwumm
Copy link

spent a few hours on it today -
a single page renders quite nicely, but getting everything in a single document
turned out to be quite difficult if you want to keep all links working.

@SethTisue
Copy link
Member

@ritschwumm if your attempt is abandoned, perhaps you could link to a wip branch that someone else could pick up...?

@nafg
Copy link

nafg commented Oct 29, 2017

I think just sticking this on it should work:

<form method="get" action="http://www.google.com/search">
  <input type="search"   name="q"  placeholder="Google site search">
  <input type="hidden" name="sitesearch" value="https://www.scala-lang.org/files/archive/spec/2.11/" />
  <input type="submit" value="Go!" />
</form>

Also, you can use Algolia, like Play's docs.

@ritschwumm
Copy link

@SethTisue sorry, i don't have a branch - i refuse to sign a CLA, so that wouldn't make much sense.

here's what i have so far:

#!/bin/bash

rm spec/all.md
rm build/spec/all.html
rm -f test.pdf

# TODO index needs layout toc
chapters='
01-lexical-syntax
02-identifiers-names-and-scopes
03-types
04-basic-declarations-and-definitions
05-classes-and-objects
06-expressions
07-implicits
08-pattern-matching
09-top-level-definitions
10-xml-expressions-and-patterns
11-annotations
12-the-scala-standard-library
13-syntax-summary
14-references
15-changelog
'

# prefix chapters with a special anchor
(
    #echo "---"
    #echo "title: Scala Language Specification"
    #echo "layout: default"
    #echo "---"
    #echo ""
    for i in $chapters; do
        echo >&2 "### $i"
        echo '<a name="CHAPTER-'"$i"'"></a>'
        cat "spec/$i.md" 
        #| tr '\n' '\0' | perl -pe 's/^---(.*?)---//' | tr '\0' '\n'
    done
) |
# remove target page name from links to anchors
perl -pe "s/\[([^\]]+)\]\(\d\d-[a-z-]+\.html(#[^)]+)\)/[\1](\2)/g"      |
# point links to chapters to the CHAPTER anchor
perl -pe "s/\[([^\]]+)\]\((\d\d-[a-z-]+).html\)/[\1](#CHAPTER-\2)/g"    |
cat >spec/all.md

# TODO add a chapter-anchor
#   \[  ([^\]]+)                \]
#   \(  (\d\d-[a-z-]+\.html)    \)

#[Unicode escape](01-lexical-syntax.html) or by an [escape sequence](#escape-sequences).
#<a name="pookie"></a>
    
bundle exec jekyll build -d build/spec/ -s spec/ --baseurl="."
docker run --security-opt seccomp:unconfined  --rm -v "$(pwd):/converted/" arachnysdocker/athenapdf athenapdf -D 1000 build/spec/all.html test.pdf
evince test.pdf

@adriaanm
Copy link
Contributor

Since you don't want to sign a CLA, could you clarify under which license you post this code?

@ritschwumm
Copy link

ritschwumm commented Oct 30, 2017

haha, good question :)
WTFPL, if you can work with that - or do you need something more formal?

@adriaanm
Copy link
Contributor

@ritschwumm, thanks -- public domain (== WTFPL) is fine with me. Just looking to avoid any licensing issues for the project, which is ultimately what the CLA is about.

@dsbos
Copy link
Member

dsbos commented Nov 13, 2017 via email

@jvican
Copy link
Member

jvican commented Dec 7, 2017

@ritschwumm I had to make some changes in your script to generate a valid pdf document, but that contribution is great, I wouldn't have been able to figure it out myself. Thank you.

I also managed to create a mobi file out of the all.html via KindleGen (https://www.amazon.com/gp/feature.html?docId=1000765211). Most links work and it's overall readable. The style could be improved, but I'm happy with the result.

@mghildiy
Copy link

If I understand correctly, objective here is to generate PDF for one of the scala website(containing scala spccifications).

@jvican
Copy link
Member

jvican commented Jan 21, 2018

I think it would be great if, as a first step, we get a whole html file (like the one made by @ritschwumm) that has all the chapters and which is readable. From there, we can easily convert to PDF and to ebook formats through athenapdf (or maybe pandoc too?) and kindlegen.

@mghildiy
Copy link

Is it something like this we need:
https://github.com/showdownjs/showdown

@sake92
Copy link

sake92 commented Sep 27, 2018

I agree with this:

i'd probably render markdown to html with some JS library and feed the thing to electron-pdf, athenapdf or maybe chrome (headless). the latter works really well in my experience. where can i find the markdown sources? i might give it a try...

There's already support for that in my hepek project. It uses headless Chrome via Selenium, waits for JS to load and snapshots its HTML (see example here). 😃
Layout depends just on HTML's print CSS.

I'll try on weekend to tackle this! Probably hardest issue will be to map markdown files to corresponding hepek abstractions..

@ritschwumm
Copy link

ritschwumm commented Sep 27, 2018

how about a slightly different approach: if i remember correctly, the main obstacle was the irregular link structure of the original files. maybe we can just make them more regular somehow?

apart from that i'm not convinced that regex search&replace is the way to go - manipulating meaningful data structures is so much easier... is there a simple way to have those - some parser, maybe?

@adriaanm
Copy link
Contributor

I'm more than happy for someone to rework the markdown sources if that makes generating pdf/html/mobi... easier!

@jvican
Copy link
Member

jvican commented Sep 28, 2018

As I see it though, these are the two true challenges:

  1. Make mathjax notation render correctly (especially in PDF and ebook formats)
  2. Merging independent spec sections (multiple markdown files) into one consistent view of the spec (only one markdown file).

There's not a lot of value in changing the content of the markdown sources if these two problems are not tackled (and also I would favor the least possible diff to makes this possible 😄). As soon as we have a unified markdown file with all the chapters, we can use pandoc to turn the spec into an ebook or PDF.

@ritschwumm
Copy link

@jvican how is mathjax problematic?

@jvican
Copy link
Member

jvican commented Sep 28, 2018

Maybe it wasn't mathjax but whatever is being used for the notation of the language. In the PDF I generated a while ago, the notation was poorly displayed and it rendered most of the snippets explaining Scala's grammar unreadable.

@sake92
Copy link

sake92 commented Sep 29, 2018

As promised, here is the site and pdf.
Source code is here.

I mostly struggled with maths+code interactions but somehow managed to get it working.. 😄
Of course, there's lots more work to be done.

@SethTisue
Copy link
Member

scala/scala#7432 is merged! So we can now generate a PDF locally.

I'm not closing this ticket yet, though, because there is work left to do: I need to actually publish the PDF on our website. Soon!

@SethTisue
Copy link
Member

whoa, we're live! https://scala-lang.org/files/archive/spec/2.13/spec.pdf

@SethTisue
Copy link
Member

For those who like PDF versions of things, see also the discussion at scala/scala3#10767 (comment) about a PDF version of the Scala 3 Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests