See also: https://www.visidata.org/
See also: https://www.visidata.org/
One thing I often do with CSVs is sum up all or specific rows in a column.
For example maybe you're doing end of year taxes and now you have this large CSV export from your bank or payment provider with multiple categories and you want to get the totals for certain things.
In a GUI tool it's really easy to sort by a column and drag your mouse to select what you want and see it summed in real time.
Oftentimes things aren't clean enough to have 100% confidence that you can solve this with an automated script because maybe something is spelled slightly different but it's really the same thing. This feels like one of those things where spending a legit 10-15 minutes once a year to do it manually is better than trying to account for every known and unknown edge case you could think of. The stakes are too high if you get it wrong since it's related to taxes.
Has anyone found a really good standalone basic spreadsheet app that "just works" which isn't Microsoft Excel that works on Windows or Linux? I don't know why but Libre and Open Office both struggle to parse columns out in certain types of CSVs and the sorting behavior is typically a lot worse than Google's spreadsheet app but I'd like to remove some dependence on using Google.
Have you tried gnumeric?
Nope, but I just tried. I went to their site and noticed there's no binaries for Windows. The site is also served over HTTP (not HTTPS) and the default experience for apt installing it on Ubuntu 22.04 didn't work due to a bunch of packages no longer existing:
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/evince-common_42.3-0ubuntu3_all.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/poppler/libpoppler118_22.02.0-2ubuntu0.2_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/poppler/libpoppler-glib8_22.02.0-2ubuntu0.2_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/ghostscript/libgs9-common_9.55.0%7edfsg1-0ubuntu5.5_all.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/ghostscript/libgs9_9.55.0%7edfsg1-0ubuntu5.5_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/libevdocument3-4_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/libevview3-3_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/evince_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/w/webkit2gtk/libjavascriptcoregtk-4.0-18_2.42.1-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/w/webkit2gtk/libwebkit2gtk-4.0-37_2.42.1-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
Going to go with nope on this one.Gnumeric isn't in main on Debian/Ubuntu anymore?
Ed: looks like it is in universe on Ubuntu - you have universe enabled?
https://packages.ubuntu.com/search?keywords=gnumeric
Ed: Did you apt update? Core pieces of evince missing sounds very strange?
I've never tried this myself but you could try importing the CSV in sqlite and then run aggregate queries over it. Does that sound useful?
You can but it's back to depending on code to get the totals. This is one spot where IMO being able to visualize the data by seeing the rows and have immediate feedback on the sum is useful. You can continue dragging or use CTRL + SHIFT clicking to select more stuff as needed while letting your brain decide what should be grouped together.
With a SQL / code approach you have to account for these things without being able to see them and then adjust the code afterwards to include your custom groupings. It ends up taking more time. If the categories didn't change every year it would for sure be worth it to code up a solution since you'll know the edge cases by looking at the existing CSV but it's a moving target because it could change next year.
Was about to suggest the same
I haven't dug too much into the tool, but could you use something like Datasette for that?
There is sc-im, which is the closest you'll get to a full spreadsheet app in terminal with vi controls.
Jupyter?
Have you looked at data wrangling software like Easy Data Transform, Knime or Alteryx? It is specifically written for cleaning, merging, summing and reshaping tabular data. They can all handle CSV and Excel files (plus a lot more).
If you want to use the CLI, Visidata might be what you want. It does have a bit of a learning curve. Beyond that, I've found it quite handy to do quick data explorations. e.g. there are shortcuts for histograms, filtering, x-y plots, etc.
One of the things that greatly improved my csv workflow is duckdb. It’s a small binary that allows querying csv with sql.
You can do the same with SQLite, which is usually already installed in most environments
sqlite> .import test.csv foo --csv
A poor man's version of csvlens is something like:
sqlite -column :memory: '.import --csv file.csv tmp' 'select * from tmp;' | bat
which imports the csv into sqlite and outputs it to bat, my favorite pager - use `less` or whatever else you desire.In addition to using a pager with sqlite3's fantastic text-only output .modes, if the CSV contains hyperlinks I use a custom UNIX filter I wrote that outputs simple, minimalist HTML. Then I view with text-only browser.
For example, this is how I use YouTube. I never use the YouTube website, with its gigantic pages and its "Javascript player", not to mention all of the telemetry. All the search results and information about videos is stored in SQL or CSV, viewed with a text-only sqlite3 output .mode, and optionally converted to simple HTML.
For me, this is better than a "modern" web browser that's too large for me to compile.
Wow, this is awesome. Thanks a lot.
duckdb is a single file with no dependencies and it's fast. Still blows my mind how quickly it can query GB sized gzipped CSVs.
I use DuckDB for queries and Visidata for quick inspections.
Between those two, I can work with not only CSVs, but also JSON and Parquet files (which are blazing fast -- CSVs are good for human readability and editability, but they're horrendous for queries).
CLI CSV tools pop up every now and then, but there's too many of them and I feel that my use cases are sufficiently addressed with only 2 tools.
If you use jupyter check out what I'm building with Buckaroo. The aim is to have a table viewer that does the right/obvious thing for most data (formatting, sorting, downsampling large datasets, summary statistics, histograms). all customizable. Supports pandas and polars.
If using python, I’ll also recommend polars sql context manager to run queries on csv.
https://docs.pola.rs/py-polars/html/reference/api/polars.SQL...
Long live duckDB! Big fan here.
Not on “computer” but some such in your pipeline should be good enough more often than not?
awk -v OFS='\t' '{$1=$1; print}' | column -t
many csv data sets cannot be handled by awk due to csv supporting things like commas inside fields, newlines inside fields, etc. Your solution would also fall over for any tabs within fields. Plenty of tools out there support RFC 4180 CSV files as well as common csv variants. And these tools have been around for a decade and have suites of cli commands that can be piped together, just speaking "CSV" as a format rather than pure text streams.
Awk now supports a `--csv` flag for processing csv's. https://github.com/onetrueawk/awk/blob/master/README.md
How does one output properly csv quoted rows? It seems thr csv flag works only for parsing inputs.
oh nice TIL
Good points on using proper CSV tooling 100%; so that’s why the disclaimer “more often than not”
BTW I just realized that I was omitting the field separator ‘-F,’ actually.
For those pesky nested commas from top off simplistic text hat still I’d go back to sed for fun and potential profit:
sed 's/","/"\t"/g; s/^"/"\t/; s/"$//; s/,,/, ,/g' | column -t
Not elegant nor hands-off and might be missing something else; def needs the right data / volume and fiddling as we know ymmv; can get unwieldy quickly etc etc
TLDR; use column -t when applicable
I enjoy using clickhouse-local for parsing csv files. I generally hit situations where I need custom delimiter and custom parsing rules, I find it handles all of these edge cases very well. Recently I found that if my csv files are compressed, i don't even need to uncompress them, it auto-magically figures out the compression format and process it for me.
Which one is better for querying CSV from both command line and programmatically?
1. SQLite 2. DucksDB 3. clickhouse-local
Definitely clickhouse ecosystem.
- For querying csv data from command line, I use clickhouse-local.
- For querying csv data programmatically using a library, I use chdb (embedded version of clickhouse)
- For querying large amount of csv data programmatically, I offload it to clickhouse cluster which can do processing in distributed fashion.
If you are looking from query performance perspective, this blog is useful: https://www.vantage.sh/blog/clickhouse-local-vs-duckdb
Didn’t know about them, seems interesting.
See we wouldn't need this if everyone had standardized on TSV instead
I don't understand if this is meant as a joke or not. Do you mean that we then would have Tsvlens? Tabs don't automatically make columns aligned, nor do they eliminate the problem of quoting. I don't get it, but then again, I'm exceptionally dim-witted.
TSV are far superior to CSV as the separator is a TAB and not commas, semicolons or other weird stuff depending on your locale. Same for single and double quotes. Only thing to be aware are tsv created by Excel with newlines embedded in text fields, which are a huge PITA for almost all parsers.
Because tabs are flexible space, you can render them really wide in most text editors, wide enough that all the columns line up. But it is a hacky solution!
ngrid is quite usable, stable, and sensible for streaming work: https://github.com/twosigma/ngrid
a TUI program without a screenshot might as well not exist... nobody is going to use it if they can't see what it looks like.
Heh, and to think I downloaded dozens of Slackware floppy images over a 28.8 modem without any screenshots! What a fool I was!
(I have used this utility weekly for almost 10 years. It isn't pretty. It's effective.)
Maybe they'd accept a PR to the README with a screenshot.
Is there an open source thing similar to this that has a windows gui?
Emacs has an excellent csv mode.
Libreoffice is a great tool for opening csv files
Nice! Once or twice I've used tad as a GUI to view csv files, but I usually use vi with nowrap or read the file in R. Now csvlens will be my default for csv files.
I like viewing most CSVs in vim. It can be a little annoying when the columns don’t line up well. Often I’ll replace all the commas with a tab, and then set the tab width to some very high value to make it all line up.
Of course, there exist plenty of CSV files that don’t follow any particular standard, so the trick doesn’t always work (if there are tabs in the data, if there’s some very wide field, or if there’s a comma in the data). Works good enough for the files I have, though.
There are a few vim plugins that make CSVs more convenient.
I found the perfect solution for me with https://www.moderncsv.com. Starts fast, focused GUI, fully featured, no bullshit!
Really cool, hope it was open source
Some people are trying to make a living from the software they write. Open source is not very conducive to that.
Love this!
Call me lame, but are there any open-source projects that accomplish this type of gui in typescript? I have an idea but it uses something written in JavaScript.
You can use any javascript library in typescript, and TS is all JS at runtime anyway. You can even add types without touching the JS library, just write a .d.ts file and stick it anywhere that TS looks for sources.
Does it allow specifying columns when filtering for rows?
For instance, the shown example filters for "Bug" but seems to filter for rows that contain "Bug" in any of the columns. Can I specify to filter for "Bug" only looking at a certain column?
At the same time - is filtering by numbers implemented, like filtering for rows that have a numeric value above X in a certain column?
If not these should be on TODO list - those are very common operations for .csv type data.
You want https://www.visidata.org/
I'd love something like this in csvkit, xsv, or qsv. The refragmentation of CSV CLI tools is counter to the long term trend.
I have been looking for a viewer exactly like this for so long! Visidata is nice, but way more complex than what I was after. This will fit perfectly into my workflow, thank you for sharing!
This is glorious, someone put it in homebrew!
diffeence from https://github.com/BurntSushi/xsv ?
See also: (tv) Tidy Viewer. A cross-platform CLI csv pretty printer.
This would be nice to have with lesspipe.sh viewer.
I love these sorts of programs. Is there a list so where of data programs like this for the command line? Im sure there will be.
I tried some TUI tool that was pretty advanced at viewing tables (sorting, filtering etc.), but I don't remember the name. It was on a WSL instance on my work laptop which I nuked when they laid me off.
I remember that it was tab-based. Any idea what that could have been?
That looks like a great CSV viewer.
I've enjoyed using csvkit[^0] in the past. The viewer isn't as good as csvlens seem to be, but it comes with the ability to grep, cut and pipe CSV data which has come in handy.
csvlens + csvkit might be a great combination.
I've been thinking about a terminal spreadsheet editor for about a year now. I haven't deeply investigated this tool while I write this comment but I'd love for something like this for Terminal that supported `=A1+B2` kind of formulas.
Back in the day, we called this Lotus 123 and VisiCalc.
https://en.m.wikipedia.org/wiki/Lotus_1-2-3
The spreadsheet data model lives a double life.
(1) On one hand you can use it for data that is really tabular,
(2) But it also has an engine that can compute the dependency relationships between cells and recalculate the cells affected by a change. This is quite different from conventional programming languages where you are required to specify an order to put operations in. Of course this a problem for exploiting parallelism but I'd charge it is one more bit of cognitive load that makes it harder for beginners and non-professional programmers. (Professional programmers are just used to it and only run into problems in unusual cases where circularity is involved, but I think it's one more thing that beginners struggle with.)
The worst problem is a lack of separation between code and data. If you are doing an analysis you might put something like
in A30. Excel will change that to if you insert a row in there, which helps, but they lock you into the mindset of "I'm making the December sales report" as opposed to "I'm making the monthly sales report". That is, the data and the analysis should be two separate things: just as you can put the December data or the January data into a Python script.(Note some of the same problems still exist with "notebooks" and "workspaces" where you wind up with a file that has both code and data in it which can be problematic to check into git, particularly when your are working for people who would like to have a beautiful notebook with an analysis in it to view in GitHub but will then struggle to version control it. Many "data scientists" fail to rise above the December sales report even though there's a clear path to turn a Jupyter notebook into a Python script.)
---
As much as that is a rant I think there's a huge untapped market for things that are like spreadsheets but different. That is, something that looks like Excel but is specialized for editing tabular data (no formulas, CSV import 'just works' all the time, ...) or something that has formulas like Excel but not on a grid or not just on a grid. For the latter there was this product
https://en.wikipedia.org/wiki/TK_Solver
which is still around. People were amazed with TK Solver when it came out and I'm surprised to this day that it hasn't had a lot of competition.
We had that once, in an MSDOS-era program called Javelin ( https://en.wikipedia.org/wiki/Javelin_Software ):
"Unlike models in a spreadsheet, Javelin models are built on objects called variables, not on data in cells of a report. For example, a time series, or any variable, is an object in itself, not a collection of cells which happen to appear in a row or column... Calculations are performed on these objects, as opposed to a range of cells, so adding two time series automatically aligns them in calendar time, or in a user-defined time frame. Data are independent of worksheets..."
That was forty years ago and as far as I know there hasn't been a program that works the same way since. Perhaps someone could reverse-engineer it (the jav.exe file is only 56,448 bytes!)
Spreadsheets are kind of interesting tech. Because the user is more-or-less directly building the data dependency graph, I’d expect them to map better to actual hardware (massively parallel in so many dimensions) better than a sequential procedural language like C.
But, they are not very fashionable.
It would be fun to have a pipeline to compile a spreadsheet, that produces an executable. Potential outputs: cpu, gpu, fpga. Haha.
Yes, exactly - imagine a modern version written in Rust, with vi key-bindings, etc. Please someone be inspired by this idea and build this. TUIs are making a comeback and the spreadsheet is still one of the most useful and effective tools in all information science.
I wrote an SC fork that does this ,will send a link when im home
While not built around CSV, two terminal spreadsheet tools I have successfully used in the past are sc-im and the (neo)vim plugin vim-table-mode:
https://github.com/andmarti1424/sc-im/
https://github.com/dhruvasagar/vim-table-mode
Back then I stopped using sc-im because it could not import/export XLSX, if I remember correctly. Apparently it can today!
vim-table-mode always felt a little fragile and I don't want to be bound to vim anymore. That said, it still feels like a small miracle to me to have functional spreadsheet formulas inside markdown documents – calculation and typesetting all in one place.
Strongly considering donating to the creator of Visidata, it's one of the terminal tools I can't live without. I use it every day.
huh, something (maybe similarity to VisiCalc name) had me dismissing it as commercial software
Yep, I am doing the $5/mo Patreon level because I love it so much.
If it's that important and you can afford a few bucks a month, I strongly urge you stop considering your donation, and (right now) follow the links on the GitHub to actually start supporting the project.
It's really easy to wallow in the "considering" phase of a decision indefinitely (or maybe that's just me :D)
Didn't know about it, thank you
Can confirm, I consider visidata to be the gold standard for data exploration.
I have largely given up on spreadsheets, replacing them with relational databases. And on top of being a world class tool for understanding the data for import, visidata makes for a pretty great query pager.
Wow!
I've been using Visidata and love it for viewing, extracting rows or columns to a different file, but I know it is capable of much more that I haven't figured out yet
visidata is awesome. It also supports a bunch of other file formats such as excel sheets.